Introduction

In this file we will go through the annotations and clouds of 8 nouns: horde, hoop, spot, staal, stof, schaal, blik, spoor: a more general description of their selection can be found here. For each of them, the sense distribution, a sort of confusion matrix and a description of the clouds will be shown. The descriptions can still go much deeper; for now the priority is an overview of the possibilities and variation across lemmas before going too deep into each of them. At the same time, they are still only descriptions and no conclusions are being drawn from them yet, so beyond this introduction the rest is not summarized yet.

Sense distribution

For all the cases we have at least two homonyms, of which at least one is polysemous. The sense tags have codes described in a table with definitions at the beginning of each section, but the annotators also had the option of assigning a geen ‘none of the above’ tag, in which case they had to add an explanatory comment.

When setting up the annotation procedure, pilot batches of 40-50 concordance lines of each type were collected to estimate the frequency of the senses we expected (we did have to exclude candidates because some sense was not frequent enough). The annotation of the pilot sets was not extremely thorough and sometimes we modified the definition set afterwards, so it’s best to keep that in mind when reading the comparison between the expected distribution and what came out of the annotations. It would be quite encouraging if the pilot estimates over samples of 40-50 tokens match the distribution in the larger sample (especially if it’s robust across batches); if (the skewness of) the sense distribution turns out to be a factor in the topology of the clouds, it’s useful to know that it can be estimated from such a small sample.

The Sense distribution subsection of each section compares then the estimated sense distribution based on the pilot concordances with the one found in each batch and the whole set of tokens. For each type a plot is shown with a row of dots per batch over a line and two more rows below the line representing the pilot-based estimate and the overall distribution. Non triangles represent tokens tagged with a given sense by the majority of the annotators (each sense has a color and each homonym has a shape); triangles represent either tokens primarily annotated with the geen ‘none of the above’ tag or tokens for which the annotators did not agree at all. The dots in the batch rows represent one token each and their transparency codes the mean confidence after standardizing its value by annotator and lemma.

This plot is followed by a stacked barplot showing the distribution of sense tags per batch per annotator: while differences in distribution across batches may depend on the tokens they include, differences within batches relate to idiosyncratic differences between annotators.

At the end of the section a small summary will be made of the structure of each lemma: it is important to keep in mind that this is highly dependent on the corpus.

Confusion matrix

Matrices

For each type, two confusion matrices will be shown in the Confusion matrix subsection. In each of them, a row represents a majority sense (or no_agreement if there was no majority) and a column represents a sense tag. By hovering over the row names it’s possible to retrieve the definition, since the tags are not precisely transparent; columns are also grouped by homonym. Since the geen ‘none of the above’ tags can have different reasons, the annotators’ explanations were classified with the following tags:

  • between, when the annotator reported doubt between two or more of the given senses;
  • not_listed, when the explanations referred to a sense that was not contemplated in the list of senses (or not understood as such);
  • unclear, when the explanations referred to either insufficient or unclear context (or simply difficulty to understand, such as “geen flauw idee” ‘no idea’), and
  • wrong_lemma, when they referred to an issue with lemmatization, part-of-speech tagging (including parts of proper nouns) or even spelling, so that the target didn’t actually correspond to what was meant to be annotated.

The first matrix shows raw annotation counts. Each cell tells the number of tokens with the majority sense of the row that were tagged with the sense of the column: The cell in the row of horde_1 and the column horde_2 will say how many tokens with the majority sense horde_1 received some horde_2 annotation. The totals indicate the number of tokens that were tagged with a given sense (for column totals). The first descriptions only focus on which senses are confused with each other. The caption also records the proportion of tokens with a certain majority sense or homonym that received the same tag from all annotators.

The second (“weighted”) matrix shows the mean of the mean original confidences of the annotations. Suppose the row is horde_1 and the column is horde_2; to fill in such a cell, for each token with majority sense horde_1 the mean of the confidences of the horde_2 annotations is computed. Since the horde_2 is not the majority sense, there won’t be more than one annotation of the same token to average across; for the horde_1 column, each token would have two to four agreeing annotations, and their respective confidences would then be averaged to reach one mean confidence per token per sense. The final value of the cel is the mean, across all tokens of that cell, of those mean confidences. Here it is important to take into account that the annotators had to assign senses rather than homonyms: very often, the disagreement between sense annotations becomes agreement between homonym choices; low confidence for homonyms with many senses may relate to hesitation between senses rather than doubt in general.

Confidence values range from 0 to 5 but were represented to the annotators as a star rating: there was no option to color no stars, so one star is “minimum confidence” (0 in numbers) and the full set is “maximum confidence” (5 in numbers). In some descriptions, confidences between 0 and 1 will be considered “low”, between 2 and 3, “medium”, and 4 or 5 “high”, but in practice the great mejority of the confidence ratings is high. That also means that when looking at the mean confidences, while 2.5 could be considered a medium value, in practice it is quite low value, below the normal mean. Therefore, values of the weighted matrix that are equal to or greater than the mean confidence values of the whole type will be darker and boldened, against lighter values that are lower. This number will also be reported in the table caption, along with the median.

Discussion of examples

Concordances with no agreement between the annotators, geen ‘none of the above’ tags and unexpected confusions illustrate different sorts of challenges: sloppy annotation, conventional usages not considered in the original list of senses, creative usages and hard to parse contexts. After presenting general observations regarding the annotation of a given lemma, some of these challenging examples may be discussed.

When examples are cited, the target item is highlighted in bold and color; context words selected by annotators as cues may be boldened, if necessary. Some of the cues may be wrongly annotated: there was a bug in the annotation tool by which, upon registering context words, a word with the same wordform as the actual goal but in a previous position might be tagged in its stead. This was noticed during the annotation process and the annotators (especially those who had already sent their work) were warned, but some might not have checked their results properly. I will clean it, eventually (they are quite evident).

If a list of context words is given, the wordforms will be in italics accompanied by their position in parentheses: L or R to indicate whether they are to the left or right of the target, plus a number indicating the number of words in between.

Nephology

For each type, the Nephology subsection discusses the quantitative and qualitative effects of the parameters in the token clouds. First, the strength of the parameters is assessed and those that seem to have little to no effect in the variation between models are kept constant to compare the resulting selection; then other combinations that might provide different results are explored, and finally some combination of parameters that seems to provide “satisfying” models is kept constant in order to look at the actual effect of the less important parameters. Normally, the strongest ones are those that select first order context features, while the second order parameters rarely make much of a difference.

To ease the descriptions, the parameters will be written in all caps and their values will follow them separated by colon. These are:

First order part-of-speech (FOC-POS)
Can take the value FOC-POS:nav, when only nouns, adjectives and verbs were selected as first order features, or FOC-POS:all, when there was no such restriction (still, some part-of-speech tags were ignored always, such as interjections).
The tendency is to default to FOC-POS:nav, since function words are probably less informative (kind of linguistically informed default).
First order window (FOC-WIN)
Can take the value FOC-WIN:5, when only features within a 5-5 window of the target were included, or FOC-WIN:10, when a 10-10 window was used.
The tendency is to default to FOC-WIN:10, to allow for more information; normally relying on other restrictions is enough to filter out the noise.
Positive pointwise mutual information as filter (PPMI)
Can take the value PPMI:weight when the second order vectors are weighted by the PPMI value between the first order feature they represent and the target type, PPMI:selection when only features with a positive PMI with the target type were included but the vectors were not weighted, and PPMI:no when no such filter was applied. Normally, the models with PPMI:selection are more similar to PPMI:no than to PPMI:weight and they are not considered in the initial comparisons.
The initial tendency was to default to PPMI:no, since a high PPMI value signals a feature as characteristic of the type rather than of groups of it (like a sense), but in the analyses described in this file it never performs as well as the alternatives.
Vector length (LENGTH)
Can take the values LENGTH:5000 and LENGTH:10000 when the 5000/10000 most frequent features are used as second order dimensions, or LENGTH:FOC when the same first order dimensions are used for the second order. That means that their number and frequency depends on the result of the first order restrictions for that particular sample of tokens. Normally, while this is not an extremely strong parameter, LENGTH:FOC can make a difference against the other two, frequency based values.
The tendency is to default to LENGTH:FOC because it should be better tailored to the specific context of the tokens in the cloud; it’s harder to compare clouds with different first order context words, but it does seem to perform better in most cases. Between frequency based values, LENGTH:10000 is almost always ignored, since it does not seem to make any difference with LENGTH:5000. If both frequency based settings perform very similarly, smaller numbers should be preferred (hence the tendency to choose LENGTH:FOC as well, since it normally means fewer than 5000 dimensions).
Second order part-of-speech (SOC-POS)
Can take the values SOC-POS:nav or SOC-POS:all and refers to a filter on the second order dimensions. This never makes much of a difference.
The tendency is to default to SOC-POS:nav (De Pascale, 2019, pp. 62–63).
Second order window (SOC-WIN)
Can take the values SOC-WIN:4 or SOC-WIN:10 depending on whether the PPMI values for the second order vectors were computed based on a 4-4 or 10-10 window.
This parameter seems to group models for some types, but doesn’t really affect the structure of the clouds that much as far as I can see. SOC-WIN:4 models seem to be more different from each other than SOC-WIN:10 models or pairs of models with different SOC-WIN, and in some lemmas it even impacts the individual effect of strong parameters such as FOC-POS. The tendency is to default to SOC-WIN:4 (See De Pascale, 2019, pp. 62–63).

Eventually, it would be nice to reinstate sentence boundary as parameter (replacing for example SOC-POS): currently, this parameter is fixed to only count context words within sentence boundaries. The difference between LENGTH:5000 and LENGTH:10000 also seems neglectable.

Strength of effect of parameter

While the ultimate goal is to describe how certain parameters affect the output of a model, i.e. which settings facilitate modelling which semantic phenomena, it is still relevant to identify the strength of such effect. Given a range of possible parameters, which of them make an actual difference between models, and which may be neglected and defaulted? Are these parameters always the same? How do they interact, e.g. do we need to restrict part of speech if we already use PPMI values, and viceversa?

The first way in which we can observe the strength of effect and interactions is through the NMDS representation of a distance matrix between models (level 1 of the visualization), based on pairwise procrustes analyses between the models. Models that group together in the cloud are similar to each other; parameters whose values form distinct subclouds are strong, making a difference in the representation of the tokens. Again: this does not tell us which of the values, if any, gives a better representation of anything, but already warns us that choosing one value over another results in a very different model.

Other plots may be used to further examine the effect of parameters on the pairwise distances between models:

  • boxplot of pairs of models with different number of shared parameters. The x-axis represents the number of shared parameters (between 1 and 5, since 6 means that it’s each model with itself), the y-axis the distance between them, and color coding and facets may be used to indicate another parameter shared (or not) by those models.
  • boxplot (or ocasionally scatterplot) of pairs of models that differ in only one parameter. The x-axis represents the parameter where those models diverge, the y-axis the distance between the models, and color coding may be used to indicate another parameter shared (or not) by those models.

The colors in these boxplot normally correspond to values of the parameters. For FOC-POS, the values could be nav or all when both models of the pair share one of those values, and all-nav when one of the model is FOC-POS:nav and the other one FOC-POS:all.

First order filters

First order parameters filter which context words in the environment of the target will be used to model each token, while second order parameters define the shape of the vectors that represent those context words. PPMI plays an intermediate role, in the sense that PPMI:weight | PPMI:selection play a first-order filtering role against PPMI:no, but PPMI:weight also affects the second order vectors in opposition to PPMI:selection.

As a consequence, first order parameters have an effect in the number of first order context words (by token and in total), which also affects the actual length of LENGTH:FOC, and in consequence also the number of tokens modelled by a certain solution, when they run out of context words.

In this section, two barplots show the number of remaining tokens and context words after the application of FOC-WIN, FOC-POS and PPMI (yes for PPMI:weight | PPMI:selection, no for PPMI:no), as well as a boxplot of the number of first order context words per token.

Comparing token clouds

After a first observation of the strength of parameters, a set of weaker ones (normally SOC-POS:nav + SOC-WIN:4 + LENGTH:FOC and excluding PPMI:selection) is kept constant to examine the qualitative effect of the stronger ones. Given that selection, a heatmap of the distance matrix of the subset may be examined to check how different the models are to each other.

The comparison between models normally takes the following steps:

  1. describe the general look of the clouds without color coding and how they change between NMDS and t-SNE solutions;
  2. color code with homonym and sense tags and describe the revealed structure.
    This can be illustrated with sets of clouds of some configuration, like what would be seen in Level 2 of the visualization (but not interactive)

The description includes the behaviour of outliers in NMDS solutions, if any; the separability of homonyms/senses in any kind of solutions; how many and how clear clusters show up in the different t-SNE solutions and how such structures relate to the parameters under comparison. For now, individual tokens are not examined but on an exceptional basis; clouds that seem to have a general similar shape may be regarded similar even if the actual relative position of the tokens is not.

A certain bias should be acknowledged: certain settings tend to be preferred (for theoretical reasons sometimes, but not always) and the findings in one type definitely affect how the following ones are understood. Hopefully, time and experience will provide the tools to revise these decisions with better criteria.

At the end of the subsection a next course of action is suggested, such as promising model(s) and which tokens seem interesting to look at. That is highlighted in a nice quote block at the end of each section.


horde

The noun horde was tagged with 3 definitions, reproduced in Table 1. The homonyms are roughly equivalent to English ‘horde’ (horde_1) and ‘hurdle’, which can be a literal obstacle, particularly in sports (horde_2) or figurative (horde_3). The first homonym is estimated (based on a 40-token sample) to be much more frequent than the second, but clearly distinguishable from it. While the two senses of the second homonym are quite distinct, depending on the clarity of the context there could be some overlap in the annotations of senses within the second homonym.

Table 1. Definitions of ‘horde’.
code definition example freq
horde_1 1 bende, ordeloze groep personen een woeste horde 26
horde_2 2.1 materiële hindernis, m.n. houten raamwerk gebruikt bij het hordelopen de 400m horden bij de vrouwen 5
horde_3 2.2 hindernis in figuurlijke zin een horde nemen 8

Sense distribution

The sample consists of 280 tokens (7 batches) out of 3224 occurrences in the QLVLNewsCorpus; the distribution of the majority senses of each batch, as well as the pilot-based estimate and the overall distribution, are reproduced in Figure 1. The distributions of the annotations (not majority senses) by annotator are shown in Figure 2. Batch 3 was annotated by 4 annotators.

As estimated, the first homonym tends to be the most frequent, with at least half the tokens of each batch, and between the senses of the second homonym, the figurative one tends to be more frequent. The only exception is the seventh batch, where a huge majority of the tokens was tagged with the literal “hurdle” sense: there was a particularly large number of sports articles in this batch. In any case, the overall distribution is very similar to the estimated one.

Figure 1. Distribution of majority senses of 'horde' per batch

Figure 1. Distribution of majority senses of ‘horde’ per batch

According to Figure 2, sense distribution was also quite constant across annotators, within each batch.

Figure 2. Distribution of sense annotations of 'horde' per annotator, grouped by batch.

Figure 2. Distribution of sense annotations of ‘horde’ per annotator, grouped by batch.

“horde” is a noun with two homonyms of very different frequencies, where the least frequent homonym is polysemous with two senses of similar frequency.

Confusion matrix

Matrices

The confusion matrix between the majority senses and other tagged senses can be seen in Table 2 (raw number of tokens with such senses assigned) and Table 3 (mean confidence of such sense annotation in each token).

We would expect no confusion between horde_1 ‘horde’ on one side and the others (literal and figurative ‘hurdle’). There are, however, 7 (out of 280) tokens where homonyms are confused, which will be discussed in the “Examples section” along with the tokens with no agreement and other cases of geen ‘none of the above’.

Table 2. Non weighted sense matrix of ‘horde’ senses. Proportion of tokens with full agreement per sense-tag is: horde_1: 0.89, horde_2: 0.88, horde_3: 0.86. Proportion of tokens with full agreement per homonym is: horde: 0.89, hurdle: 0.95.
horde
hurdle
geen
senses horde_1 horde_2 horde_3 between not_listed unclear
horde_1 168 3 3 0 10 3
horde_2 1 57 6 0 0 1
horde_3 0 6 51 0 1 0
unclear 0 0 1 0 0 1
no_agreement 0 3 3 1 0 1
total 169 69 64 1 11 6

The weighted matrix shows that there is a relatively high mean confidence in the annotations that became majority senses, and a relatively lower one in the disagreeing annotations.

Table 3. Weighted sense matrix of ‘horde’ senses. Mean confidence across the lemma is 4.52; values above are darker and boldened. Median confidence across the lemma is 5.
horde
hurdle
geen
senses horde_1 horde_2 horde_3 between not_listed unclear
horde_1 4.58 3.33 3 0 4 3
horde_2 5 4.69 3.17 0 0 0
horde_3 0 3.5 4.33 0 0 0
unclear 0 0 2 0 0 1.5
no_agreement 0 4 3.5 4 0 4

Examples

Among the challenging concordances of this lemma, there are instances of inattentive annotation, additional conventional usages, creative usages and hard to parse contexts.

Inattentive annotation: In some cases, the sense of the target in a concordance is quite straightforward and the alternatives can be disregarded. That is the case of all concordances with horde_1 ‘horde’ as majority sense and an altenative from the other homonym, those with unclear as minority sense (except (4)), and two of the cases with not_listed as minority sense. (1), for example, was tagged as horde_1 ‘horde’ by the majority (with maximum and medium confidence) and horde_3 ‘hurdle-figurative’ as alternative (with maximum confidence). The inverse situation is seen in (2), the one case with majority horde_2 ‘hurdle-literal’ and alternative sense horde_1 ‘horde’, with maximum confidence. Given the financial context, it is rather an instance of horde_3 ‘hurdle-figurative’.

  1. de wereld moest zijn en beweerde dat een ` sterk Duitsland essentieel is om de Aziatische horden in te dammen ’ kreeg hij veel handen op elkaar . Zijn aanhangers vergeleken
    had to be […] the world and claimed that a ‘strong Germany is essential to contain the Asian hordes’ he was applauded. His supporters compare
  2. van ABN Amro signaleert een vangnet aan de onderkant rond de 575 punten en een ’ horde ’ op 610 . " Breekt de AEX door het niveau van 610 heen
    ABN Amro signals a safety net below around the 575 points and a ‘hurdle’ at 610. The AEX brakes through the level of 610,

Additional conventional usages: Some annotations reveal conventional usages of the target lemma that were not contemplated in the original list of senses but were attested by the annotators. In the case of horde, the definition of horde_1 ‘horde’ as “bende, ordeloze groep personen” ‘gang, unordered group of people’ was too restrictive for some annotators, who resisted to applying the tag to instances where the members of the horde were not human or, in one case, not unordered. There were three annotators from different batches that diverged from their colleagues in three tokens each, assigning geen ‘none of the above’ instead of horde_1 ‘horde’. For one of them, the members of the hordes were different kinds of insects, like in (3); in another, one case was of grasshoppers, one was not specified and the third case was a “Mongoolse horde” ‘Mongolian horde’. Here the annotator explained that such was not an unordered group of people (disregarding that there is a conceptualization of a group as ‘unordered’ in the history/meaning of the term). (4) is also a case of a horde of insects, but it received a different annotation itself: the annotator that assigned geen ‘none of the above’ only reported insufficient context, and it was one of the majority annotators, who assigned horde_1, that added a comment acknowledging that it was a group of animals instead of people (beetles, to be more precise).

  1. de winkel uit talloze soorten moest kiezen . Het ging mij toen om een horde weerzinwekkende kakkerlakken die mij de avond tevoren geteisterd had . Ze vonden ineens de
    the shop […] had to choose among countless kinds. Back then I was concerned with a horde of disgusting cockroaches that had infested me the previous evening. At once they found
  2. kwam pas in de jaren twintig , vanuit Bordeaux . Rond 1940 was de horde hier . De coloradokever is ongeveer een centimeter lang , de larven met hun
    came only in the ’20s, from Bordeaux. About 1940 the horde was here. The Colorado beetle is about one centimeter long, the larvae with their

Creative usages: The third annotator that resisted to assign horde_1 ‘horde’ to groups not made of people had to deal with more creative examples: two cases of vehicles (“aanstormende vrachtwagens” ‘trucks’ and “KTM’s”, race motorcycles) and one of “danceprojecten” ‘dance projects’.

Hard to parse contexts: Finally, some concordances are just too vague to make a decision between the senses. That is the case of the three tokens with no agreement and the one with unclear as majority sense. The former show hesitation between horde_2 ‘hurdle-literal’ and horde_3 ‘hurdle-figurative’ in sport contexts, where it is not clear enough which sport is being talked about. An example is (5), where the four annotators were equally split between the options; only the context word “set” suggests it’s about tennis (and it was indicated by the annotators that chose the figurative sense and only them). In the other two cases there is no such disambiguating clue.

The token with unclear as majority sense is instead a fragment of instructions for a crossword and therefore makes a point of not giving much disambiguating information; the bullet point the target is part of just reads “bijbelse naam, horde” ‘biblical name, horde’.

  1. begonnen . " Met Krajicek en Sluiter nam Sjeng Schalken even vlot de eerste horde . De Limburger zette de Spanjaard Blanco in drie sets weg , maar nu
    begun. "Both with Krajicek and Sluiter Sjeng Schalken took the first hurdle equally fluently. The Limburger removed the Spanish Blanco in three sets, but now

Nephology of horde

A first impression on the clouds relates to the stress values of the dimensionality reduction and the parameters that make the strongest distinctions between models. We have 144 models of horde created on 10/03/2020, modelling between 262 and 279 tokens. The stress value of the NMDS solution for the cloud of models is 0.161.

Strength of parameters

The main division of the cloud of models is given by a combination of FOC-POS and PPMI, with a half made of PPMI:weight | FOC-POS:nav models, split mainly by PPMI and then by FOC-WIN and the other half with the rest, split by PPMI (Figure 3); second order parameters seem to have a minimal effect. The stress values of the NMDS solutions of these models range between 0.105 and 0.254

Figure 3. Cloud of models of 'horde'. Explore it <a href='https://montesmariana.github.io/NephoVis/level1.html?type=horde'> here</a>.

Figure 3. Cloud of models of ‘horde’. Explore it here.

Figure 4 shows that only the first order parameters make a difference by themselves; FOC-POS only without PPMI:weight and more with SOC-WIN:4 than with SOC-WIN:10. PPMI also makes a bigger difference with FOC-POS:all. This pattern is found in all adjectives, but in none of the other nouns.

Figure 4. Distances between models of 'horde' that vary along only one parameter, colored by `PPMI` and `SOC-WIN`.

Figure 4. Distances between models of ‘horde’ that vary along only one parameter, colored by PPMI and SOC-WIN.

Figure 5 shows the distribution of pairwise distances between models based on the number of shared parameters: models in the 0 column have no parameter in common, while models in the 5 column share all but one (are those shown in Figure 4). The green boxplots represent pairs models with different FOC-POS, while the orange and light blue ones share FOC-POS:all and FOC-POS:nav respetively; boxes are split by PPMI. This shows that PPMI:weight models overall tend to be similar to each other and most different to PPMI:no models; but also that pairs of models with different FOC-POS and no PPMI:weight involve are more different to each other than those with the same FOC-POS, and that pairs of models where one has PPMI:weight and the other one doesn’t are most similar if both have FOC-POS:nav than if they have different FOC-POS or even if they share FOC-POS:all. This is not the case with other parameters: with second order parameters, for example, their light blue boxplots are higher than the orange ones!

Figure 5. Distances between models of 'horde' by number of shared parameters, colored by whether they share `FOC-POS` and split by `PPMI`.

Figure 5. Distances between models of ‘horde’ by number of shared parameters, colored by whether they share FOC-POS and split by PPMI.

First order filters

Figure 6 shows the quantitative effect of the first order filters. The panels to the left show the number of remaining tokens (top) and first order context words (bottom) after applying each first order filter, and the right panel shows the number of remaining context words per token after applying each filter.

While FOC-WIN and certainly PPMI reduce the number of possible context words and their count per token the most, FOC-POS:nav models are the ones that lose tokens, although more in combination with the other restrictions (up to 6.09%).

Figure 6. Remaining tokens and context words of 'horde' after application of first order filters.

Figure 6. Remaining tokens and context words of ‘horde’ after application of first order filters.

Models comparison

To look at the effect of the strongest parameters, the weaker ones will be kept constant at LENGTH:FOC + SOC-WIN:4 + SOC-POS:nav, initially disregarding PPMI:selection. Looking at the distance matrix between the corresponding clouds (Distance matrix 1), the one with the looser filters seems to be the most different to the rest, with the most similar model being its FOC-WIN:5 counterpart (distance of 0.24), followed by its FOC-POS:nav counterpart (distance of 0.39). The three models with FOC-WIN:5 and something else (either PPMI, FOC-POS or both) seem to be the most similar to each other, with distances between 0.1 and 0.29.

Distance matrix 1. Distance matrix between some models of ‘horde’
id FOC-POS PPMI FOC-WIN
1 nav weight 10_10
2 nav no 10_10
3 all weight 10_10
4 all no 10_10
5 nav weight 5_5
6 nav no 5_5
7 all weight 5_5
8 all no 5_5

The NMDS solutions with PPMI:no are more susceptible to outliers (discussed in the “Outliers” subsection): the largest part of the cloud is crammed around the center, taking distance from a couple of single tokens. The homonyms form two distinct hemispheres, and the senses of the second homonym are clear enough as well, particularly in the models with FOC-POS and/or PPMI filters. Sometimes, the figurative tokens are too widely spread.

Regarding the t-SNE solutions (see Figure 7 for an example), clusters are hard to identify with perplexity of 5 (unless color-coded), and then they are more clear-cut in the PPMI:weight models: there is a big cloud of horde_1 ‘horde’ and then two smaller clouds of horde_2 and horde_3 respectively (literal and figurative ‘hurdle’). There are still some tokens between the clear clusters, but they don’t seem to persist across models. If we include the PPMI:selection, separability improves in FOC-POS:nav models compared to their PPMI:no counterparts, but not as much as in the PPMI:weight models.

Figure 7. Tokens of 'horde' in the t-SNE solutions (perplexity 30) of the selected models

Figure 7. Tokens of ‘horde’ in the t-SNE solutions (perplexity 30) of the selected models

With the first order parameters kept constant, the second order parameters seem to have little to no effect; the resulting distance matrices seem to rarely have a value over 0.18. LENGTH:FOC seems to have a bit more of an effect if there are no first order filters, but the difference is more evident in the NMDS solutions (the LENGTH:FOC models are less sensitive to the outliers); looking at the distance matrices, it’s really not so strong. It must be noted that the second order vectors of the LENGTH:FOC models have as many dimensions as remaining context words, which are summarised in the bottom left panel of Figure 6.

All in all, the FOC-POS:nav models seem better than their FOC-POS:all counterparts; and FOC-WIN:5 looks better than FOC-WIN:10 only for FOC-POS:all in the NMDS solution (this should be confirmed with separability indices).

For deeper insight models with FOC-WIN:10 + FOC-POS:nav + SOC-WIN:4 + SOC-POS:nav will be inspected, comparing PPMI:weight | PPMI:selection and LENGTH:FOC | LENGTH:5000.

Outliers

The outliers are two tokens with only one relevant –and rather infrequent– context word and a concordance in French.1 The most problematic are (6) and (7), where the only surviving context word is kad/noun (abbreviation of kadetten). The problem is likely that there is only one surviving context word with a rather low frequency (226) but therefore a high PPMI (5.52), which might lead to a rather sparse vector with LENGTH:5000 | LENGTH:10000. LENGTH:FOC vectors seem to deal better with them. While they remain outliers in all models, they skew them less when either PPMI:weight or FOC-POS:nav filters are applied and even less with LENGTH:FOC. These tokens also tend to be peripheric in the t-SNE models.

  1. DCLA ) 8.19 . David Palinckx ( ABES ) 7.84 . 60^m horden 0,914 ( kad ) : Wim Marynissen ( AVKA ) 9.80 .
    ) 8.19 . David Palinckx ( ABES ) 7.84 . 60^m hurdles 0,914 ( cadets ) : Wim Marynissen ( AVKA ) 9.80 .
  2. 57.32 400m horden ( sch-2de reeks ) : 1. Shaun Malone ( ACBR ) 59.22 300m horden ( kad ) : 1. Eri Van Vosselen ( SWIN ) 44.05 Ver ( sen )
    400m hurdles ( sch-2nd lap ) : 1. Shaun Malone ( ACBR ) 59.22 300m hurdles ( cadets ) : 1. Eri Van Vosselen ( SWIN ) 44.05 Ver ( sen

When the power of (6) and (7) are cancelled, another outlier comes out: (8), a fragment from a song in French, Les colonies. Here, if there are no filters, the words et (a conjunction in French, latin conjunction otherwise), les (an article in French, ‘lesson’ in Dutch), and de (a preposition in French and an article in Dutch) are counted; the FOC-POS:nav filter excludes de because it’s tagged as a determiner (while the others are tagged as nouns) and a PPMI filter discards les because of its negative PMI with the target. (8) remains consistently an outlier, and rather peripheric in t-SNE models.

  1. oreille pour entendre ’ au secours ’ Où sont passés les baobas et les hordes de gosses Dans cette ère de négoce où ne vivent que les big boss
    to hear “Help” Where did the baobas and the hordes of children go. In this era of trading where do they live but the big

hoop

The noun hoop was tagged with 3 definitions, reproduced in Table 4. The homonyms are roughly equivalent to ‘lot/heap/bunch’ (hoop_1 in the concrete, specific sense; hoop_2 in the broader sense of ‘a lot of…’) and ‘hope’ (hoop_3). The second homonym is expected to be much more frequent than the first one and very easy to distinguish from it; the first one is not only polysemous but inbalanced in the frequency of its senses and highly dependent on the specificity of the context for a confident distinction between them.

Table 4. Definitions of ‘hoop’.
code definition example freq
hoop_1 1.1 ongeordende stapel een hoop rommel, gooi maar op de hoop 1
hoop_2 1.2 grote hoeveelheid een hoop mensen, een hele hoop geld 10
hoop_3 2 positieve verwachting, vertrouwen op iets positiefs hoop koesteren, de hoop uitspreken dat… 29

Sense distribution

The sample consists of 320 tokens (8 batches) out of 41946 occurrences in the QLVLNewsCorpus; the distribution of the majority senses of each batch, as well as the pilot-based estimate and the overall distribution, are reproduced in Figure 8. The distributions of the annotations (not majority senses) by annotator are shown in Figure 9. No batch was annotated by 4 annotators. As expected, hoop_3 ‘hope’ is overwhelmingly frequent and hoop_2 ‘heap-quantity’ is more frequent than hoop_1 ‘heap-literal’ (which also seems, at first glance, to be tagged with low confidence). The sense distribution is relatively stable and the most infrequent sense almost always occurs at some point.

Figure 8. Distribution of majority senses of 'hoop' per batch

Figure 8. Distribution of majority senses of ‘hoop’ per batch

Sense distribution across annotators within batches (Figure 10) looks very uniform, both per homonym and per sense.

Figure 9. Distribution of sense annotations of 'hoop' per annotator, grouped by batch.

Figure 9. Distribution of sense annotations of ‘hoop’ per annotator, grouped by batch.

“hoop” is a noun with two homonyms of very different frequencies, where the least frequent homonym is polysemous with two senses of different frequencies.

Confusion matrix

Matrices

The confusion matrix between the majority senses and other tagged senses can be seen in Table 5 (raw number of tokens with such senses assigned) and Table 6 (mean confidence of such sense annotation in each token). Ideally, there would be no confusion between hoop_1 ‘heap-literal’ and hoop_2 ‘heap-quantity’ on one side and hoop_3 ‘hope’ on the other. Indeed, there are only 4 cases, tagged primarily with hoop_3 ‘hope’, where some annotator also assigned a sense tag of the other homonym. These are discussed in the “Examples” subsection along with the concordances with geen ‘none of the above’ tags.

Table 5. Non weighted sense matrix of ‘hoop’ senses. Proportion of tokens with full agreement per sense-tag is: hoop_1: 0.65, hoop_2: 0.86, hoop_3: 0.96, wrong_lemma: 0.5. Proportion of tokens with full agreement per homonym is: geen: 0.5, pile: 1, hope: 0.96.
pile
hope
geen
senses hoop_1 hoop_2 hoop_3 unclear wrong_lemma
hoop_1 17 6 0 0 0
hoop_2 8 59 0 0 0
hoop_3 1 3 240 3 3
unclear 0 0 1 2 1
wrong_lemma 0 0 1 0 2
total 26 68 242 5 6

The weighted matrix (Table 6) shows a relatively high confidence in the agreeing annotations compared to the disagreeing ones; the overall mean confidence is quite high, and so are the confidence values of most hoop_2 ‘heap-quantity’ annotations, even if they are not boldened. The one unclear case with high confidence in the hoop_3 ‘hope’ annotation is reported as (12) in the “Examples” subsection.

Table 6. Weighted sense matrix of ‘hoop’ senses. Mean confidence across the lemma is 4.64; values above are darker and boldened. Median confidence across the lemma is 5.
pile
hope
geen
senses hoop_1 hoop_2 hoop_3 unclear wrong_lemma
hoop_1 4.72 4.5 0 0 0
hoop_2 3.62 4.58 0 0 0
hoop_3 3 3.33 4.71 1 2.33
unclear 0 0 5 0.75 1
wrong_lemma 0 0 2 0 4.5

Examples

Among the challenging concordances of this lemma, there are instances of inattentive annotation, additional conventional usages, errors in the corpus and hard to parse contexts.

Inattentive annotation: Three of the four concordances with hoop_3 ‘hope’ as majority sense and another homonym as alternative are rather straightforward cases. The constructions may be atypical like in (9) or deceiving for an inattentive annotator: the expression “in de hoop (dat)” means in (10) ‘in the hope (that)’ but could be interpreted as ‘in the heap’ if the sentence is not read in full. The same annotator misidentified the target in (11), where the clause that the expression links to is before the given context.2

  1. . Schitterend met computers geanimeerd mierenepos . Mier Flik redt eigenhandig zijn hoop wanneer een zwerm sprinkhanen aanvalt voor de jaarlijkse plunder . RTL 4 , 20.00
    Only the effort of a massive American military force at the borders of Irak would warrant a real hope that they could make Saddam Hoessein fall and surrender himself without a fight
  2. vinden . " Nijs heeft enkele garages in het buitenland afgebeld , in de hoop zijn gestolen juweeltje terug te vinden . De carjacker is ongeveer 1m70 klein en
    find. "Nijs called a couple of garages abroad, in the hope of finding his stolen jewel. The carjacker is about 1m70 short and
  3. De volle zalen die het trekt met zijn producties sterken de verantwoordelijken in die hoop . De operette heeft in Brussel , Vlaanderen en Nederland grote successen gekend .
    The packed alls that his productions brings strengthen the people responsible in that hope. The operette had great success in Brussels, Flanders and the Netherlands.

Additional conventional usages: Some of the challenging annotations have brought to light conventional usages of the target that were not contemplated by the original list of senses. Such is the case of (12) and (13). In (12), two annotators assigned a geen ‘none of the above’ tag with minimum confidence and reported to be quite lost in the meaning of the expression and the other assigned a hoop_3 ‘hope’ tag with maximum confidence. Here the target is part of a Belgian idiomatic expression, “hoop en al” ‘at most, lit. heap and all’, actually derived from hoop_1 ‘heap’. (13) is one of the cases with hoop_3 ‘hope’ as majority sense and hoop_2 ‘heap-quantity’ as alternative. The correct sense is more related to the latter (and actually more to hoop_1 ‘heap’) than to the former, since the target refers to a mierenhoop ‘anthill’ (it’s both a case of inattentive annotation and additional conventional usage).

  1. leukste speelgoed . Van het oorspronkelijke dierenbestand van het park blijven op dit ogenblik hoop en al één lama , enkele herten en drie pauwen over . Navraag leerde
    the nicest toy. From the original animal stock of the park there remain at this moment more or less (lit. heap and all [the rest]) one llama, some deers and three peacocks. Further enquiries let us know
  2. . Schitterend met computers geanimeerd mierenepos . Mier Flik redt eigenhandig zijn hoop wanneer een zwerm sprinkhanen aanvalt voor de jaarlijkse plunder . RTL 4 , 20.00
    Splendid ant epic animated with computers. The ant Flik singlehandedly saves his hill when a swarm of grasshoppers attacks for the yearly plunder. RTL 4, 20.00

Errors in the corpus: Some of the tokens with geen ‘none of the above’ tags reveal issues in the corpus itself, such as forms of the verb hopen ‘to hope’ mistaken for the noun (examples 14 and 15), spelling mistakes between the target and hoog ‘high’ (examples 16 and 17), and the target as part of proper names. Sometimes the role of the target in the proper name is already obscure, such as in the surname “De Hoop Scheffer”, but sometimes it retains its meaning, like in the name of the theater (or theater group) “Hoop in de toekomst” ‘Hope in the future’ and the organization “Hoop der Renners” ‘Hope of the racers’. In the first case, all annotators identified it as a name with maximum confidence, but in the other two, one annotator selected geen ‘none of the above’ and reported insufficient context and the other two selected hoop_3 ‘hope’.

In both (14) and (15) two annotators assigned hoop_3 ‘hope’ with high confidence and the other one assigned geen ‘none of the above’ (with high and minimum confidence respectively) stating that the target was a verb. While semantically the hoop_3 ‘hope’ tag is correct, it is indeed a different lemma.

  1. Hopen op beterschap , vrezen voor erger Landskampioen Anderlecht raakt in Lommel niet verder dan
    Hoping for recovery, fearing for worse National champion Anderlecht will not go further than […] in Lommel
  2. deze keer niet zo erg en valt te tolereren want toch ruime cijfers . Hopen dat hij het de volgende keer , wanneer het wél moet , wél doet " ,
    not so bad this time and it can be tolerated because because of actual wide numbers. Hoping that next time, when he must, he will",

Example (16) received two geen ‘none of the above’ tags with medium/high confidence, reporting it was a misspelling of hoog ‘high’; and one hoop_3 ‘hope’ with medium confidence; (17), instead, received two hoop_3 ‘hope’ annotations with maximum confidence and one geen ‘none of the above’ tags with medium confidence reporting a spelling mistake and suggesting two possible readings: either “liep de zege nog hoog op” ‘the victory rose further’ or even “riep de zege noog hoop op” ‘the victory cried hope’ (?). It is most likely meant to be hoog ‘high’.

  1. beste verhopen voor de komende weken . De belangrijke wedstrijden volgen mekaar nu in hoop tempo op . De Luikenaars verloren drie keer op rij en zullen naar Brussel
    best hopes for the following weeks. The important matches follow each other now in hope/high tempo. The Lieégois lost three times in a row and will […] to Brussels
  2. overtuigend . Daarna werden de Luxemburgers in conditie overklast en liep de zege nog hoop op . Mark Schmetz was met acht doelpunten topscorer . Harold Musser
    convincing. Afterwards the Luxemburgers were [football stuff I don’t know] and the victory climbed hope/up. Mark Schmetz had the top score with eight points. Harold Musser

Hard to parse contexts: Examples (18) and (19) have rather unclear contexts. In the former, the target is part of a title and only next to a proper name, but the rest of the context gives enough information that most of the annotators assigned hoop_3 ‘hope’. In the latter, the target is also part of a title “Hoop en Vreugde” ‘Hope and joy’, in which the meaning is still active, but the concordance is so chaotic without paratext that all three annotators assigned geen ‘none of the above’. Current models don’t include either of these tokens.

  1. NA Hoop Nick De Loenen ( Wintam ) Nick De Loenen is Wintams hoop in bange dagen
    Hope Nick De Loenen (Wintam) Nick De Loenen is the hope of Wintam in fearful days
  2. Kruisstraat Brakel-Onkerzele 5-1 , RC Jager Brakel-Horebeke 1-2 , Zwaluwen Impe-Beekboys 1-3 . 2 : Hoop en Vreugd-Kilim 1-5 , Real Lapino-Kazuivelshotters 1-0 . 3A : Astene-FC Machelen 1-3 , Smetlede-Zwaluwen
    Kruisstraat Brakel-Onkerzele 5-1 , RC Jager Brakel-Horebeke 1-2 , Zwaluwen Impe-Beekboys 1-3 . 2 : Hope and Joy-Kilim 1-5 , Real Lapino-Kazuivelshotters 1-0 . 3A : Astene-FC Machelen 1-3 , Smetlede-Zwaluwen

Nephology of hoop

A first impression on the clouds relates to the stress values of the dimensionality reduction and the parameters that make the strongest distinctions between models. We have 144 models of hoop created on 10/03/2020, modeling between 298 and 317 tokens. The stress value of the NMDS solution for the cloud of models is 0.17.

Strength of parameters

The cloud of models has two clear groups divided by FOC-POS along the first dimension, and three clear groups within each of them based on the PPMI, with PPMI:selection between PPMI:no and PPMI:weight but closer to the former than to the latter, and better distinction within the FOC-POS:all group. Within the subgroups of FOC-POS:all, SOC-WIN draws divisions (Figure 11). The stress values of the NMDS solutions of these models range between 0.237 and 0.317.

Figure 11. Cloud of models of 'hoop'. Explore it <a href='https://montesmariana.github.io/NephoVis/level1.html?type=hoop'> here</a>.

Figure 11. Cloud of models of ‘hoop’. Explore it here.

The distance between models that vary along only one parameter take a very different shape from what is seen with horde (Figure 5 and 4). In first place, the distances are much larger, with a median around 0.4 even for models with 5 shared parameters. Second, pairs models with different FOC-POS are extremely different from each other, while those with FOC-POS:all have the smallest differences, only higher when one of the models has PPMI:weight and the other one does not. Finally, PPMI values don’t have much of an impact in the strength of FOC-POS, other than the one just mentioned (Figure 12). No other parameter exhibits this behaviour for this lemma: FOC-POS is definitely the most relevant parameter.

Figure 12. Distances between models of 'hoop' by number of shared parameters, colored by whether they share `FOC-POS` and split by whether they share `PPMI`.

Figure 12. Distances between models of ‘hoop’ by number of shared parameters, colored by whether they share FOC-POS and split by whether they share PPMI.

Figure 13 also shows a particular picture, with much higher values for FOC-POS, and lower when PPMI:weight for first order parameters but higher for LENGTH and SOC-POS.

Figure 13. Distances between models of 'hoop' that vary along only one parameter, colored by `PPMI`.

Figure 13. Distances between models of ‘hoop’ that vary along only one parameter, colored by PPMI.

First order filters

Figure 14 shows the quantitative effect of the first order filters. The panels to the left show the number of remaining tokens (top) and first order context words (bottom) after applying each first order filter, and the right panel shows the number of remaining context words per token after applying each filter.

FOC-POS:nav, alone and in combination with the ohter restrictions, filters out tokens; up to 5.99% are lost with the strictest filters. PPMI reduces the total number of context words (bottom left panel) much more than FOC-WIN, but the number of context words per token (right panel) to the same degree, which means that the remaining context words occur more often. (This also diverges from the situation of horde).

Figure 14. Remaining tokens and context words of 'hoop' after application of first order filters.

Figure 14. Remaining tokens and context words of ‘hoop’ after application of first order filters.

Models comparison

To compare the stronger variables, we’ll first keep the weaker ones (SOC-POS:nav + SOC-WIN:4 + LENGTH:FOC) constant, ignoring also PPMI:selection at the beginning. It must be remembered that the distance between the models are still quite large. Figure 15 shows the selected models, while Figure 16 replaces the PPMI:weight ones with PPMI:selection: the comparison between PPMI:selection and PPMI:no is something in between, in terms of contrast between distances. In the former, the smallest differences are given between models that only differ in FOC-WIN; in the latter, the difference between FOC-POS:all models with the same FOC-WIN is smaller.

Distance matrix 2. Distance matrix between some models of ‘hoop’
id FOC-POS PPMI FOC-WIN
1 nav weight 10_10
2 nav no 10_10
3 all weight 10_10
4 all no 10_10
5 nav weight 5_5
6 nav no 5_5
7 all weight 5_5
8 all no 5_5
Distance matrix 3. Distance matrix between some models of ‘hoop’
id FOC-POS PPMI FOC-WIN
1 nav selection 10_10
2 nav no 10_10
3 all selection 10_10
4 all no 10_10
5 nav selection 5_5
6 nav no 5_5
7 all selection 5_5
8 all no 5_5

Without color coding, the NMDS solutions look very similar to each other, quite round and with few outliers (more with PPMI:selection than with PPMI:weight). The clouds are strongly dominated by tokens of the second homonym “hope”, which is highly frequent. Those of the first homonym “heap” do tend to stick together in the PPMI:weight models (while much more disperse in PPMI:no models), although not distinctly separate from “hope”.

The t-SNE models are unstructured archipelagos with perplexity 5, and PPMI:weight | FOC-POS:nav models have some tight groupings with higher perplexity (small groups of tokens of “hoop opgeven” ‘give up hope’, “hoop uitspreken” ‘express hope’, or co-occurrences with nieuw ‘new’), but they mostly seem to grow more disperse, while their PPMI:no + FOC-POS:all counterparts grow denser. Only FOC-POS:all models has clusters matching senses in these clouds (Figure 17).

Figure 17. Tokens of 'hoop' in the t-SNE solutions (perplexity 30) of the selected models

Figure 17. Tokens of ‘hoop’ in the t-SNE solutions (perplexity 30) of the selected models

What Figure 17 cannot show is the relative position of the tokens to each other: there are not many clear clusters, certainly not consistent across models, and selection of certain areas shows that tokens do group in very different way in different models. It’s necessary to take a closer look at those that seem to show some sort of cluster to see what is being represented by each configuration of parameters.

Given the previous observations, a selection of PPMI:weight + FOC-POS:all models was examined to compare the effect of the weaker parameters. While the difference is not striking and is harder to assess without a deeper inspection of the tokens, the models with FOC-WIN:5 + SOC-WIN:4 + LENGTH:5000 show a relatively nice clustering (e.g. Figure 18).

Figure 18. Reasonable token cloud of 'hoop'.

Figure 18. Reasonable token cloud of ‘hoop’.

For further inspection it could be interesting to start with the cloud in Figure 18 and some variations of it.


spot

The noun spot was tagged with 3 definitions, reproduced in Table 7. The homonyms mean roughly “ridicule” (spot_1) and “spot(light)”, with a literal (or metaphorical) spotlight for spot_3 and, metonymically, a videoclip for spot_2. The two homonyms have similar frequencies, as well as the two senses of the polysemous one, but a relatively high number of challenging tokens is expected.

Table 7. Definitions of ‘spot’.
code definition example freq
spot_1 1 oneerbiedige, ridiculiserende uitspraak of behandeling de spot drijven met, bijtende spot 15
spot_2 2.1 reclameboodschap via radio, televisie, bioscoop een spotje voor tandpasta 9
spot_3 2.2 schijnwerper de spots richten op 7

Sense distribution

The sample consists of 240 tokens (6 batches) out of 3496 occurrences in the QLVLNewsCorpus; the distribution of the majority senses of each batch, as well as the pilot-based estimate and the overall distribution, are reproduced in Figure 19. The distributions of the annotations (not majority senses) by annotator are shown in Figure 20. Batch 2 was annotated by 4 annotators.

While some batches have more skewed distributions (the first one has mainly “ridicule” cases and the sixth one, “spotlight” cases), the overall distribution resembles the estimated one, with fewer cases of ambiguous tokens but still rather balanced frequencies.

Figure 19. Distribution of majority senses of 'spot' per batch

Figure 19. Distribution of majority senses of ‘spot’ per batch

Figure 20 shows that the sense distribution across annotators within batches is quite stable, with the exception of the first annotator of batch 6, who assigned a geen ‘none of the above’ tag to more than a quarter of their tokens rather than to spot_3 ‘spotlight’ like their colleagues.

Figure 20. Distribution of sense annotations of 'spot' per annotator, grouped by batch.

Figure 20. Distribution of sense annotations of ‘spot’ per annotator, grouped by batch.

“spot” is a noun with two homonyms of similar frequency, one of which has two senses of similar frequency (but the distribution does not seem robust across batches.

Confusion matrix

Matrices

The confusion matrix between the majority senses and other tagged senses can be seen in Table 8 (raw number of tokens with such senses assigned) and Table 9 (mean confidence of such sense annotation in each token).

We would expect no confusion between spot_1 ‘ridicule’ on one side and spot_2 ‘videoclip’ and spot_3 ‘spotlight’ on the other, but also rare confusion between the senses of the second homonym. It’s indeed the case that a very small number of tokens with one of these senses as majority sense received a tag from another sense; most of the confusion comes from geen ‘none of the above’ annotations, particularly wrong_lemma and not_listed. In the spot_3 ‘spotlight’ cases with wrong_lemma tags, the annotator suggested that it might be referring to the name of a magazine —the concordances are indeed quite particular and similar to each other, so we could expect them to cluster in the clouds. There is also a strong group of tokens with not_listed as the majority sense: there, the target item is part of the English expression hot spot, and the annotators suggested the meaning of “place” and/or pointed out that it’s from English.

Table 8. Non weighted sense matrix of ‘spot’ senses. Proportion of tokens with full agreement per sense-tag is: not_listed: 0.84, spot_1: 0.97, spot_2: 0.77, spot_3: 0.77. Proportion of tokens with full agreement per homonym is: geen: 0.68, ridicule: 0.97, film/spotlight: 0.81.
ridicule
film/spotlight
geen
senses spot_1 spot_2 spot_3 between not_listed unclear wrong_lemma
spot_1 105 0 1 1 0 1 0
spot_2 3 47 4 0 1 3 0
spot_3 0 0 62 0 1 3 10
not_listed 0 0 3 0 19 0 0
unclear 1 1 0 0 0 3 1
no_agreement 1 1 3 0 0 3 3
total 110 49 73 1 21 13 14

The confidence assignments in agreement are quite high, with a relatively low confidence for the hot spot group (not_listed row and column). The unexpected here is to see a mean confidence of 5 for the three cases where the majority sense tag referred to spot_2 ‘videoclip’ but the minority sense to spot_2 ‘ridicule’, a different homonym. They could be considere ambiguous, particularly if primed with other input.

Table 9. Weighted sense matrix of ‘spot’ senses. Mean confidence across the lemma is 4.32; values above are darker and boldened. Median confidence across the lemma is 5.
ridicule
film/spotlight
geen
senses spot_1 spot_2 spot_3 between not_listed unclear wrong_lemma
spot_1 4.74 0 1 0 0 0 0
spot_2 5 4.35 3 0 1 3.67 0
spot_3 0 0 4.27 0 1 2.33 1.8
not_listed 0 0 1 0 3.4 0 0
unclear 4 3 0 0 0 1.17 0
no_agreement 2 3 3.83 0 0 0 3

Examples

Among the challenging concordances of this lemma, there are instances of inattentive annotation, additional conventional usages and hard to parse contexts, including reasonable ambiguities.

Inattentive annotation: The one concordance with spot_1 ‘ridicule’ as majority sense and an alternative of the other homonym, particularly spot_3 ‘spotlight’, is shown in (20). The confusion could come from the possible combination between “helder” ‘clear’ and “spot” in the ‘spotlight’ meaning, but the rest of the context suggests the majority sense. Besides, the disagreeing annotator was the only one that did not select “helder” ‘clear’ as a relevant context word.

  1. toen Walter drie was . Maar ook op zijn afkomst keek Matthau met heldere spot terug . " De joden vonden het schuldgevoel uit en de Ieren maakten er
    when Walter was three. But also at his origins Matthau looks back with lucid scorn. "The jews invented the feeling of guilt and the Irish made

Additional conventional usages: The 19 tokens with not_listed as majority sense reveal a number of occurrences of spot coming from English and retaining their meaning of ‘place, point in space’ in restricted fix expressions: most of them are cases of hot spot –mostly as attractive places, but also in contexts of conflict and crime–, two are of “sweet spot” and one is of “soft spot”. The expressions are inserted in regular Dutch sentences and were attested in all newspapers except Het Laatste Nieuws, all batches; the three cases with spot_3 ‘spotlight’ as alternative are from the same batch, with the same annotator assigning that sense instead of geen ‘none of the above’ (even selecting sweet/spot as relevant context words), but they are as straightforward as the rest. Two of them are reproduced in (21) and (22).

  1. toestellen volledig automatisch in formatie te laten vliegen , en om ze te helpen elkaars sweet spot te vinden , de plaats van de opwaartse luchtstroom . Het volgende doel ,
    to let planes fly in formation completely automatically, and to help them find each other’s sweet spot, the place of the upwards airflow. The next goal
  2. alleen kén ik geen vrouwen die beantwoorden aan het geschetste profiel , die alle Europese hot spots kennen en altijd de nieuwste en meest exclusieve cosmetische producten in huis halen .
    I just don’t know any women that respond the required profile, that know all the European hot spots and always have the newest and most exclusive cosmetic products at home.

Hard to parse contexts: In some cases the context is not clear enough to disambiguate the token. That is the situation of one concordance line with just an unordered list of words, which received unclear as majority sense. But there are also situations where the register of the text is key to understand the meaning but cannot be deduced from the raw text, without paratextual information. This is exemplified by all but one of the 14 tokens with some wrong_lemma annotation, 10 of which had spot_3 ‘spotlight’ as majority sense. (23) corresponds to some sort of TV agenda: one of the annotators assigned spot_2 ‘spot-videoclip’, which is referentially accurate but not the meaning of the target, and the other two assigned geen ‘none of the above’, one reporting insufficient context and the other one suggesting it was the name of an English(-speaking?) program. Another token, with unclear as majority sense, also shows a similar situation. The rest of the concordances appear in the same batch and follow the same pattern (see example 24): they are articles from Het Nieuwsblad between 2003 and 2004 and there target is the first word in all but one of them; it is immediately followed by an expression like “op 1ste/2de/3de” ‘on 1st/2nd/3rd’, except for three in which it’s followed by the name of a town, and then some full sentence in a sports topic. In the ones where the name of a town follows the target, the majority of the annotators selected geen ‘none of the above’; in the rest, the majority assigned spot_3 ‘spotlight’ with varying degree of confidence, and the same annotator consistently assigned geen ‘none of the above’, almost always suggesting it might be the name of a magazine.

  1. onder Idi Amin massaal afgemaakt . NGC , 21.00-21.30u. Dokwerk : Red spots . Zie TV-vooraf . Ned.3 , 21.00-22.00u. Het ongeluk .
    massively finished under Idi Amin. NGC, 21.00-21.30u. Dockwork: Red spots See TV-before. Ned. 3, 21.00-22.00u. The accident.
  2. Spots op tweede Walem moest tegen Broechem in laatste instantie nog James Van Vaerenbergh aan
    Spots on the second Walem had to face Broechem as a last resort yet James Van Vaerenbergh

Reasonable ambiguities: The three tokens with spot_2 ‘spot-videoclip’ as majority sense and spot_1 ‘ridicule’ as alternative, (25) through (27), could indeed be regarded as ambiguous. The majority sense looks the most likely, but the alternative is not precisely a bad fit and, in fact, always received maximum confidence. They belong to different batches.

  1. hier namelijk knedliky om hun hersenstam zitten . " In vergelijking met de vrolijke spot van Cerný komt Pavel CZácek , de vroegere student journalistiek , dodelijk serieus over .
    people here [have] precisely knedliky in their brainstems. In comparison to the cheerful spot/joke of Cerný looks Pavel CZácek, previously a journalism student, deadly serious.
  2. partij Nieuw Rechts , omdat de inhoud te racistisch zou zijn . " De spot is suggestief , racistisch en discrimineert " , zegt commercieel directeur Theo van der Gun van
    new party New Right, because the content was too racist. The spot is suggestive, racist and discriminates, says commercial director Theo van der Gun
  3. Voordien was er enkel televisiereclame op de Franstalige zenders , maar plots moesten er ook Nederlandstalige spots ingesproken worden " , zegt Ramaekers . Ook hij beaamt dat de sector de
    there was only TV advertisement on the French speaking networks, but suddenly there also had to be Dutch speaking spots included, says Ramaekers. He agrees too that de sector the last

Nephology of spot

A first impression on the clouds relates to the stress values of the dimensionality reduction and the parameters that make the strongest distinctions between models. We have 144 models of spot created on 10/03/2020, modeling between 220 and 235 tokens. The stress value of the NMDS solution for the cloud of models is 0.106.

Strength of parameters

The grouping of the cloud of models seems to come firstly from an interaction between FOC-POS and PPMI, so that the right half belongs to those with FOC-POS:all + PPMI:selection | PPMI:no, the bottom left quarter is populated by PPMI:weight models and the top left quarter by FOC-POS:nav + PPMI:selection | PPMI:no (Figure 21). The PPMI:weight group is divided by FOC-WIN, but such division is less clear in the rest of the subclouds. The stress values of the NMDS solutions of these models range between 0.165 and 0.264.

Figure 21. Cloud of models of 'spot'. Explore it <a href='https://montesmariana.github.io/NephoVis/level1.html?type=spot'> here</a>.

Figure 21. Cloud of models of ‘spot’. Explore it here.

Figure 22 shows the distribution of pairwise distances between models based on the number of shared parameters: models in the 0 column have no parameter in common, while models in the 5 column share all but one. The green boxplots represent pairs models with different FOC-POS, while the orange and light blue ones share FOC-POS:all and FOC-POS:nav respetively; panels are split by PPMI. This plot shows that PPMI:weight models overall tend to be similar to each other (lower right panel) and most different to PPMI:no models (upper right panel); but also that pairs of models with different FOC-POS and no PPMI:weight involved are more different to each other than those with the same FOC-POS (left panels and upper middle panel), and that pairs of models where one has PPMI:weight and the other one doesn’t are most similar if both have FOC-POS:nav than if they have different FOC-POS or even if they share FOC-POS:all (middle panels). This effect is not so visible in FOC-WIN and opposite with second order parameters: pairs of models that share the same value tend to have larger distances than those with different values.

Figure 22. Distances between models of 'spot' by number of shared parameters, colored by `FOC-POS` and split by `PPMI`.

Figure 22. Distances between models of ‘spot’ by number of shared parameters, colored by FOC-POS and split by PPMI.

Figure 23 focuses on pairs of models with only one different parameter value: it shows that the individual effect of first order parameters is much larger than that of the second order parameters, and that SOC-WIN:4 tends to increase the individual effect of most parameters, while FOC-POS:all the individual effect of PPMI and SOC-WIN. PPMI:weight only affects the individual effect of FOC-POS.

Figure 23. Distances between models of 'spot' that vary along only one parameter, colored by `SOC-WIN` and `FOC-POS`.

Figure 23. Distances between models of ‘spot’ that vary along only one parameter, colored by SOC-WIN and FOC-POS.

First order filters

Figure 24 shows the quantitative effect of the first order filters. The panels to the left show the number of remaining tokens (top) and first order context words (bottom) after applying each first order filter, and the right panel shows the number of remaining context words per token after applying each filter.

Only FOC-POS:nav filters out tokens by itself: 3.83% already, which increases to 6.38% with all restrictions applied; it also seems to have the greater effect in the number of context words per token. The lost tokens turn out to be mostly the “magazine” group (like example (24)). The models that don’t remove them group them tightly in all solutions, both NMDS solutions and t-SNE with any perplexity.

Figure 24. Remaining tokens and context words of 'spot' after application of first order filters.

Figure 24. Remaining tokens and context words of ‘spot’ after application of first order filters.

Model comparison

To compare the effect of the stronger models, we will first set the weaker parameters to SOC-WIN:4 + SOC-POS:nav + LENGTH:FOC, and initially discard PPMI:selection.

The distance matrix between the 8 remaining models (Distance matrix 4) suggests a main grouping by PPMI, so that the models of the former tend to be more similar to each other (with distances ranging between 0.21 and 0.41) than the latter (0.40-0.73). The one most different to the rest is the one without any restrictions, and is represented by the NMDS with a dense center and some outliers.

Distance matrix 4. Distance matrix between some models of ‘spot’
id FOC-POS PPMI FOC-WIN
1 nav weight 10_10
2 nav no 10_10
3 all weight 10_10
4 all no 10_10
5 nav weight 5_5
6 nav no 5_5
7 all weight 5_5
8 all no 5_5

The picture given by replacing PPMI:no with PPMI:selection is only slightly less drastic, while comparing PPMI:no and PPMI:selection models results in a bigger distance between any pair of models with different FOC-POS: distance between models with the same FOC-POS ranges between 0.31 and 0.5 with some exceptional 0.27 and 0.62, and between models of different FOC-POS, between 0.57 and 0.82 (Distance matrix 5).

Distance matrix 5. Distance matrix between some models of ‘spot’
id FOC-POS PPMI FOC-WIN
1 nav selection 10_10
2 nav no 10_10
3 all selection 10_10
4 all no 10_10
5 nav selection 5_5
6 nav no 5_5
7 all selection 5_5
8 all no 5_5

An inspection without color coding shows that PPMI:no NMDS solutions are more sensitive to certain outliers; the t-SNE models tend to show two pockets and a scattered mass, the pockets being small with perplexity 5, one of them growing for perplexity 20, and the scattered mass becoming more scattered with increasing perplexity. With perplexity 50, the pockets are only visible in PPMI:weight models and, to a smaller degree, in FOC-POS:nav + FOC-WIN:5 models.

Color coding lets us identify the clear group of hot spot tokens, that cluster even in the NMDS solutions (while those of “sweet spot” and “soft spot” remain spread around). The rest of the cloud is clearly split between the homonyms and less clearly between senses, with denser clouds surrounded by some outliers in the PPMI:no models. The t-SNE models (Figure 25) show a very good split between homonyms even with low perplexity, nicer for PPMI:weight models, while the PPMI:no ones become too disperse too soon. Perplexity of 50 seems too high, the tokens are too scattered. The big pocket belongs to spot_1 ‘ridicule’, while the biggest cloud joins the senses of the second homonym. Strangely enough, some tokens of spot_1 are included in the big cloud (the same in different solutions), so they would be worth a deeper investigation.

Figure 25. Tokens of 'spot' in the t-SNE solutions (perplexity 30) of the selected models

Figure 25. Tokens of ‘spot’ in the t-SNE solutions (perplexity 30) of the selected models

Fixing stronger parameters to look at the weaker ones (FOC-WIN:10 + PPMI:weight + LENGTH:FOC) gives us very similar clouds.

“spot” seems to offer a good example of the granularity of homonymy versus polysemy, but some cases require further observation. For that step it could be interesting to look at the models with PPMI:weight + FOC-WIN:10 for the first order parameters and SOC-POS:nav + SOC-WIN:4 + LENGTH:FOC | LENGTH:5000 for second order. Selecting the FOC-POS would depend on whether we want to model the “Spot op…” tokens or not.


staal

The noun staal was tagged with 4 definitions, reproduced in Table 10. The homonyms correspond roughly to “steel” (referring to the material in staal_1 and to an object made of it in staal_2) and “sample” (in a general sense in staal_3 and with the specific connotation of “evidence” in staal_4). Both homonyms are then polysemous, although with very skewed distributions, and staal_4 was not recorded in the original estimation. Of the 6 remaining tokens, 5 were instances of “man van staal” ‘man of steel’ and the other one could not be placed confidently in any category.

Table 10. Definitions of ‘staal’.
code definition example freq
staal_1 1.1 zeer hard ijzer met laag koolstofgehalte twaalf ton staal, ijzer en staal, een man van staal 21
staal_2 1.2 voorwerp of deel van een voorwerp uit zulk metaal het staal van de velgen is verroest 3
staal_3 2.1 monster van een stof of materiaal, bij wijze van proef een staal vragen, goederen op staal verkopen 10
staal_4 2.2 proef, voorbeeld, bewijs een staaltje van hun kunnen, een staaltje van bewaamheid 0

Sense distribution

The sample consists of 320 tokens (8 batches) out of 5796 occurrences in the QLVLNewsCorpus; the distribution of the majority senses of each batch, as well as the pilot-based estimate and the overall distribution, are reproduced in Figure 26. The distributions of the annotations (not majority senses) by annotator are shown in Figure 27. No batch was annotated by 4 annotators. If we consider the confusing cases in the pilot concordance to correspond to staal_1 ‘steel-material’, the overall distribution resembles the estimation, with a great majority of staal_1 ‘steel-material’ cases, then staal_3 ‘sample-general’ and staal_2 ‘steel-object’. The number of cases without an assigned sense is much smaller, and staal_4 ‘sample-evidence’ is shown to occur. The distribution does seem to vary a lot between batches, with a great majority of staal_1 ‘steel-material’ tokens in the first four batches and a greater number of staal_3 ‘sample-general’ in the other four..

Figure 26. Distribution of majority senses of 'staal' per batch

Figure 26. Distribution of majority senses of ‘staal’ per batch

Figure 28 already suggests confusion between senses of the same homonym: while the proportion of tokens annotated with a certain homonym remain quite stable across annotators of the same batch, some annotators seem to have a greater tendency to assign staal_2 ‘steal-object’ (in batches 2 through 6) or staal_4 ‘sample-evidence’ (same annotator in batch 5, and also batches 6 and 8) than their colleagues.

Figure 27. Distribution of sense annotations of 'staal' per annotator, grouped by batch.

Figure 27. Distribution of sense annotations of ‘staal’ per annotator, grouped by batch.

“staal” is a noun with two homonyms of very different frequencies, both with two senses of very different frequencies.

Confusion matrix

Matrices

The confusion matrix between the majority senses and other tagged senses can be seen in Table 11 (raw number of tokens with such senses assigned) and Table 12 (mean confidence of such sense annotation in each token).

We expect some confusion between the senses of each homonym, with little or no confusion between homonymous items. That seems indeed to be the case: there is barely any confusion between homonyms, and something between senses of the same homonym, particularly cases where the majority assigned a more general sense (staal_1 ‘steel-material’ or staal_3 ‘sample-general’) and a minority the more specific case (staal_2 ‘steel-object’ or staal_4 ‘sample-evidence’ respectively), which is to be expected if the context is not very precise. In fact, there are many more tokens with these tags as minority sense than as majority sense; none of the cases of staal_4 ‘sample evidence’ shows full agreement, and only 0.38 of staal_2 ‘steel-object’ does. Figure 28 has shown us that these annotations come from individual annotators in certain batches with a greater tendency to assign a more specific sense in comparison to their colleagues. There are very few cases that couldn’t be assigned any of the given tags; only one where the annotators couldn’t agree.

Table 11. Non weighted sense matrix of ‘staal’ senses. Proportion of tokens with full agreement per sense-tag is: staal_1: 0.69, staal_2: 0.38, staal_3: 0.59, wrong_lemma: 1. Proportion of tokens with full agreement per homonym is: geen: 0.33, steel: 0.96, sample: 0.93.
steel
sample
geen
senses staal_1 staal_2 staal_3 staal_4 not_listed unclear wrong_lemma
staal_1 229 62 1 3 4 1 0
staal_2 7 13 0 0 0 1 0
staal_3 4 0 66 22 0 1 0
staal_4 0 0 8 8 0 0 0
not_listed 0 0 1 0 1 0 0
unclear 0 0 1 0 0 1 0
wrong_lemma 0 0 0 0 0 0 1
no_agreement 1 0 1 0 0 1 0
total 241 75 78 33 5 5 1

In broad terms, mean confidence seems high in cases of agreement, and clearly higher for the “sample” homonym than for “steel”. The cases with staal_2 ‘steel-object’ as majority sense seem to get a lower confidence, although those with staal_1 ‘steel-material’ as majority and staal_2 ‘steel-object’ as alternative have a higher mean confidence.

Table 12. Weighted sense matrix of ‘staal’ senses. Mean confidence across the lemma is 4.03; values above are darker and boldened. Median confidence across the lemma is 4.
steel
sample
geen
senses staal_1 staal_2 staal_3 staal_4 not_listed unclear wrong_lemma
staal_1 4.02 4.05 3 2.33 3.5 1 0
staal_2 3.29 3.28 0 0 0 1 0
staal_3 3.5 0 4.26 4.18 0 3 0
staal_4 0 0 4.25 4.06 0 0 0
not_listed 0 0 3 0 2 0 0
unclear 0 0 5 0 0 3 0
wrong_lemma 0 0 0 0 0 0 5
no_agreement 0 0 3 0 0 0 0

Examples

Among the challenging concordances of this lemma, there are instances of inattentive annotation, additional conventional usages, creative usages, errors in the corpus and hard to parse contexts.

Inattentive annotation: Some disagreements in the annotation can only be explained with inattentive or not thorough enough annotation. The cases with disagreement between homonyms belong to the first kind, there is nothing in the concordances that could allow confusion: the “steel” tokens with “sample” as alternative are not even in the form of a sample; the “sample” tokens with “steel” as alternative have nothing to do with the material. Other cases could lead to reasonable doubt between senses of the same homonym but the annotators simply reported insufficient context instead of specifying what the issue was. For example, the he one token with unclear as majority sense, shown in (28), had staal_3 ‘sample-general’ as alternative with maximum confidence, which is perfectly reasonable. It could be that the other two annotators, who reported insufficient context, considered the concordance too ambiguous to choose between one of the senses of “sample”. Other unclear annotations, unless otherwise specified in another paragraph, have a similar situation.

  1. op eigen verantwoordelijkheid gebeurde en een informeel karakter had . Er zou aan het staal geen enkele juridische waarde worden gehecht . De procureur stemde daarmee in . "
    happened under their own responsability and had an informal character. The sample would be given no legal value. The prosecutor agreed."

Additional conventional usages: A small number of additional conventional usages of the target can be extracted from the not_listed annotations: the meaning of “strong” in expressions such as zenuwen/vrouw van staal ‘nerves/woman of steel’, a metonymic use referring to the steel industry like in (29) and one attestation of another homonym, shown in (30). (29) was annotated as staal_1 ‘steel-material’ by the majority, while the third annotator selected geen ‘none of the above’ with high confidence and suggested “als nijverheidsplek” ‘as industrial site’ as adequate tag. It is not the only attestation of this metonymical usage.

  1. Antwerpse haven wel grote zorgen , want de overslagactiviteiten van onder meer fruit , papier en staal zijn de grote werkverschaffers in de Scheldehaven . Begin volgend jaar komt er een
    Port of Antwerp many worries, because the transhipment activities of, among others, fruit, paper and steel are the great employers in the port of Schelde. From next year onwards comes a

Example (30), instead, received one staal_3 ‘sample-general’ annotation and two geen ‘none of the above’ with suggestions such as “stuk” ‘piece’ and “aanduiding van de hoeveelheid” ‘indiciation of quantity’. Here the target actually belongs to a different homonym (with a different article), which applies to stick-like parts of plants??, such as a bundle of leek.

  1. prei en spek ( 4 personen ) Ingrediënten * 2 stalen prei * 150g spek * olijfolie * aardappelpuree
    leek and bacon (4 people) Ingredients * 2 leeks (lit. sticks of leek) * 150g bacon * olive oil * mashed potato

Creative usages: One of the not_listed annotations also pointed to a more creative usage of the target, shown in (31). Two of the annotators of this token selected staal_1 ‘steel-material’: one with maximum confidence, and one with medium confidence and a comment clarifying that it was a figurative use meaning “hetzelfde zijn” ‘being the same’. The third annotator assigned geen ‘none of the above’ and described the meaning of “uit hetzelfde staal gegoten zijn” ‘to be cast from the same steel’ as “op een gelijke manier te werk gaan” ‘to work in a similar way’. It does not seem to be a very conventional expression, though.

  1. Het staat wel vast dat de senator geen opvolger zal krijgen die uit hetzelfde keiharde conservatieve staal is gegoten . Het electoraat in North Carolina is veel minder conservatief geworden ,
    It is indeed ceraint that the senator will not have a successor cast from the same superhard conservative steel. The electorate in North Caroline has become much less conservative,

Errors in the corpus: The one wrong_lemma token was annotated as such with full agreement and maximum confidence from all annotators, who noted that the target form was actually the past form of the verb stelen ‘to steal’.

Hard to parse contexts: The one case with no agreement, example (32), was assigned staal_3 ‘sample-general’ with medium confidence, staal_1 ‘steel-material’ with minimum confidence and unclear, also with minimum confidence. The target is actually in a very short sentence without punctuation marks, so it will also be hard for the current models to model, but the fact that it occurs in the plural form (stalen rather than staal) already suggests the second homonym “sample” over the first one, “steel”.

  1. ziet , is een groot vraagteken . Niet rooskleurig . " Nieuwe stalen Raymond De Backer vindt dat de Vlaamse regering sterk overdrijft in haar fosfaatpolitiek .
    see, is a great question mark. Not pink. " New samples Raymond De Backer thinks that the Flemish government exagerates strongly in its phosphate policie.

Nephology of staal

A first impression on the clouds relates to the stress values of the dimensionality reduction and the parameters that make the strongest distinctions between models. We have 144 models of staal created on 10/03/2020, modeling between 313 and 319 tokens. The stress value of the NMDS solution for the cloud of models is 0.152.

Strength of parameters

The most dividing parameter is FOC-WIN, splitting the cloud of models along the second dimension. Within each half, divisions are not striking: there is a certain grouping based on PPMI (PPMI:weight models crammed to the left), but the main tendency would seem to be for FOC-POS:all | LENGTH:FOC | SOC-WIN:10 models to go to the center of the plot and the rest to expand outwards, especially towards the right (Figure 29). The stress values of the NMDS solutions of these models range between 0.199 and 0.253.

Figure 29. Cloud of models of 'staal' colored by `FOC-POS` and `PPMI`. Explore it <a href='https://montesmariana.github.io/NephoVis/level1.html?type=staal'> here</a>.

Figure 29. Cloud of models of ‘staal’ colored by FOC-POS and PPMI. Explore it here.

Figure 30 shows the distances between pairs of models arranged by the number of parameters they share and colored by a parameter value. The top panels show that pairs models that share SOC-WIN:4 are more different to each other than those that share SOC-WIN:10 or have different values of SOC-WIN, and that the opposite happens with FOC-POS:nav. The bottom panels, on the other hand, suggest that PPMI:weight models are most similar to each other and that, while models with different FOC-WIN are a bit more different to each others than those with the same FOC-WIN, the difference is less striking than with the other parameters.

Figure 30. Distances between models of 'staal' by number of shared parameters.

Figure 30. Distances between models of ‘staal’ by number of shared parameters.

Figure 31 zooms in on the pairs of models that only differ in one parameter (column 5 in Figure 30) and shows that the individual effect of the parameters is relatively low, with that of FOC-POS lower for PPMI:weight models and for all parameters higher with SOC-WIN:4.

Figure 31. Distances between models of 'staal' that vary along only one parameter, colored by `PPMI`.

Figure 31. Distances between models of ‘staal’ that vary along only one parameter, colored by PPMI.

First order filters

Figure 32 shows the quantitative effect of the first order filters. The panels to the left show the number of remaining tokens (top) and first order context words (bottom) after applying each first order filter, and the right panel shows the number of remaining context words per token after applying each filter.

Almost no tokens are lost by first order parameter filters. FOC-POS reduces the number of context words per token more than FOC-WIN and PPMI, without reducing the total number of context words that much (these context words occur less).

Figure 32. Remaining tokens and context words of 'staal' after application of first order filters.

Figure 32. Remaining tokens and context words of ‘staal’ after application of first order filters.

Model comparison

To compare the effect of weaker variables, we selected those with SOC-POS:nav + LENGTH:FOC + SOC-WIN:4, initially discarding PPMI:selection.

The distances between the selected models are not very high; the highest (above 0.5) are between the FOC-WIN:5 + FOC-POS:all + PPMI:no and the PPMI:weight models, followed by the loosest and the strictest model (Distance matrix 6). Replacing PPMI:weight with PPMI:selection, the largest distances are between each of the FOC-POS:all + PPMI:no models and those with the opposite FOC-WIN, while a selection of PPMI:selection | PPMI:no models gives more weight to FOC-POS differences.

Distance matrix 6. Distance matrix between some models of ‘staal’
id FOC-POS PPMI FOC-WIN
1 nav weight 10_10
2 nav no 10_10
3 all weight 10_10
4 all no 10_10
5 nav weight 5_5
6 nav no 5_5
7 all weight 5_5
8 all no 5_5

Without color coding, we can see that PPMI:no models tend to have a dense core with some outliers in the NMDS solutions, while PPMI:weight models are more evenly dispersed; in the t-SNE models, some reasonable clusters begin to show with perplexity of 20, taking a nicer shape with perplexity of 30 for the PPMI:weight models and somewhat less interesting with perplexity of 50.

Color coding shows that homonyms are well distinguished; the subclouds are even almost separated in NMDS solutions of restrictive models. The “sample” homonym sticks together in all t-SNE solutions, while the “steel” tokens are widely spread, forming two or three clusters that become clear with perplexity of 20 or 30 but not so much with 2 or 50. Senses are not clearly distinguished in the NMDS solutions, but staal_4 ‘sample-evidence’ seems to cluster neatly in all t-SNE solutions (specially PPMI:weight models with perplexity of 20 or higher), in a small group with other tokens of the same homonym (Figure 33). Some of the tokens of this sense remain scattered around the cloud of the other homonym, so they require further inspection.

Figure 33. Tokens of 'lemma' in the t-SNE solutions (perplexity 30) of the selected models

Figure 33. Tokens of ‘lemma’ in the t-SNE solutions (perplexity 30) of the selected models

FOC-POS:nav models seem to have better separability than their FOC-POS:all counterparts, but once that parameter is chosen neither FOC-WIN nor PPMI nor any of the other parameters seem to offer any sort of improvement.

“staal” seems to model the distinction between homonyms neatly, but has a hard time identifying the senses of the annotation. Because of the shape of the subclouds and the variety between the options, for further inspection models with FOC-WIN:10 + FOC-POS:nav and SOC-POS:nav + SOC-WIN:4 + LENGHT:FOC will be examined, with variations along PPMI and maybe also second order parameters.


stof

The noun stof was tagged with 5 definitions, reproduced in Table 13. Both its homonyms are polysemous, although the first (and most frequent) one presents quite distinct senses, while the distinction in the second one is more subtle. The first homonym includes the senses of “substance” (stof_1), “fabric”(stof_2) and “topic” (stof_3), while the second one equals to “dust”, either in the air (stof_4) or as a powder-like state of a substance (stof_5). The first homonym is expected to be twice as frequent as the second one, with stof_3 ‘topic’ relatively infrequent; from the second homonym, the different senses were not discriminated in the pilot annotation, but it was noted that most of the occurrences corresponded to idiomatic expressions.

Table 13. Definitions of ‘stof’.
code definition example freq
stof_1 1.1 materie, substantie van een bepaald type giftige stoffen, vaste stof, grijze stof 15
stof_2 1.2 weefsel wollen en katoenen stoffen 11
stof_3 1.3 onderwerp waarover men spreekt, schrijft, nadenkt etc. stof voor een roman, stof tot onenigheid 4
stof_4 2.1 massa zeer kleine droge deeltjes van verschillende oorsprong, door de lucht meegevoerd een wolk stof, stof afnemen 10
stof_5 2.2 massa zeer kleine deeltjes als toestand van een specifieke substantie iets tot stof vermalen, tot stof verpulveren 0

Sense distribution

The sample consists of 320 tokens (8 batches) out of 24502 occurrences in the QLVLNewsCorpus; the distribution of the majority senses of each batch, as well as the pilot-based estimate and the overall distribution, are reproduced in Figure 34. The distributions of the annotations (not majority senses) by annotator are shown in Figure 35. Batches 2 and 7 were annotated by 4 annotators. Three annotators participate in both batches.

The overall frequency is quite similar to the expected one, except that the stof_4 ‘dust’ sense was tagged much more frequently than stof_5 ‘powder’, which is not too problematic considering that such distinction was not made in the pilot sample. The proportions vary across batches but not drastically.

Figure 34. Distribution of majority senses of 'stof' per batch

Figure 34. Distribution of majority senses of ‘stof’ per batch

Roughly, the proportions seem quite robust across annotations within batches, but there seems to be a split between annotators that favor stof_3 ‘topic’ or stof_4 ‘dust’, even if they belong to different homonyms. The choice between stof_4 ‘dust’ and stof_5 ‘powder’ also seems to depend on the annotator.

Figure 35. Distribution of sense annotations of 'stof' per annotator, grouped by batch.

Figure 35. Distribution of sense annotations of ‘stof’ per annotator, grouped by batch.

“stof” is a noun with two homonyms of different frequencies, both polysemous, the most frequent having one frequent sense and two less frequent ones, and the infrequent one having two senses with a skewed distribution.

Confusion matrix

Matrices

The confusion matrix between the majority senses and other tagged senses can be seen in Table 14 (raw number of tokens with such senses assigned) and Table 15 (mean confidence of such sense annotation in each token).

We would expect quite some confusion between the two senses of the “dust” homonym (stof_4 and stof_5), more than between the senses of the first homonym, but also some confusion between stof_1 ‘substance’ and the “dust” senses could be acceptable, even if the definitions themselves pointed to different homonyms.

In general terms, there is not that much overlap between the senses: the proportion of tokens with full agreement is quite high for the first homonym and lower for the second but only between senses. The highest confusion between homonyms is between stof_3 ‘topic’ and stof_4 ‘dust’, which is linked to the individual tendencies found in Figure 35. These are mostly idiomatic expressions stemming from stof_4 ‘dust’ but with a predominant “topic” theme and will be discussed in the “Examples” subsection.

Table 14. Non weighted sense matrix of ‘stof’ senses. Proportion of tokens with full agreement per sense-tag is: stof_1: 0.9, stof_2: 0.85, stof_3: 0.69, stof_4: 0.54. Proportion of tokens with full agreement per homonym is: substance: 0.87, dust: 0.69.
substance
dust
geen
senses stof_1 stof_2 stof_3 stof_4 stof_5 not_listed unclear wrong_lemma
stof_1 141 2 0 6 6 1 0 0
stof_2 4 53 3 2 0 0 0 0
stof_3 1 2 54 9 2 2 2 1
stof_4 1 2 9 52 11 1 2 0
stof_5 2 1 0 3 5 0 0 0
unclear 0 0 0 2 0 0 2 0
no_agreement 3 1 9 11 3 5 3 0
total 152 61 75 85 27 9 9 1

While for the senses of the first homonym the confidence levels seem to be high, those of the second one tend to be rather low. The doubtfulness comes probably from trying to choose between the subtly different senses of “dust”, rather than from distinguishing between homonyms.

Table 15. Weighted sense matrix of ‘stof’ senses. Mean confidence across the lemma is 4.16; values above are darker and boldened. Median confidence across the lemma is 5.
substance
dust
geen
senses stof_1 stof_2 stof_3 stof_4 stof_5 not_listed unclear wrong_lemma
stof_1 4.25 4.5 0 2.5 3.17 3 0 0
stof_2 3.5 4.48 5 2.5 0 0 0 0
stof_3 4 2 4.47 4 4 3.5 0 0
stof_4 4 4.5 3.11 3.89 3.09 0 0 0
stof_5 3 5 0 2.67 3.2 0 0 0
unclear 0 0 0 3.5 0 0 1.75 0
no_agreement 3.67 4 3.61 3.41 3.67 3.6 2.33 0

Examples

Among the challenging concordances of this lemma, there are instances of inattentive annotation, additional conventional usages, creative usages, errors in the corpus and hard to parse contexts.

Inattentive annotation: Some of the annotations suggest a lack of attention from the annotator, by assigning tags that make no sense, like stof_1 ‘substance’ and even stof_3 ‘topic’ in perfectly typical stof_2 ‘fabric’ concordances or, stranger yet, the one stof_3 ‘topic’ token with stof_1 ‘substance’ as alternative, where the minority sense is the most reasonable (33).

  1. . In de jaren vijftig gebruikte de psychiatrie in de Verenigde Staten de werkzame stof om patiënten communicatiever te maken . Kersemakers zegt het Amerikaanse onderzoek niet te kennen
    . In the fifties the psychiatry in the United States used the active substance to make patients more communicative. Kersemakers claims not to know the American research

Furthermore, the two stof_4 ‘dust’ cases with stof_2 ‘fabric’ as alternative are quite typical examples of an annotator focusing too much on a key context word (gordijn ‘curtain’ in (34) and tapijt ‘carpet’ in (35)) and not really comprehending the context. Both of the concordances were annotated by four annotators and three of them agreed on the majority sense; one of them added, for (35), that it was a figurative expression (probably from recognizing the register as a horoscope). (36) is similar and was annotated by the same annotators; here the same one that assigned stof_2 ‘fabric’ to (34) selected stof_3 ‘topic’, maybe motivated by the literary context and the metaphoricity of the expression “stof verzamelen” ‘gather dust’.

  1. zit achter in de ziekenwagen , die ’ over de hobbelige weg een gordijn van opwaaiend stof achterliet , tatatatarataaaa !!! ’ Begeleid door de sirene zingen de Kuda Buxen eendrachtig
    sits in the back of the ambulance, that leaves behind a curtain of rising dust over the bumpy road, tatatatarataaaa!!!’ Led by siren the Kuda Buxen sang harmoniously along
  2. : U geniet van de overtollige aandacht . Gezondheid : Als een tapijt teveel stof bevat voor u , geef ze dan weg . Liefde : Spring niet al
    You enjoy the excessive attention. Health: If a carpet has too much dust for you, give it away. Love: Don’t jump
  3. is , zullen de meesten van deze schrijvers geleidelijk aan uit het collectief geheugen verdwijnen en stof verzamelen in de pakhuizen van de bibliotheken . Calvino was een Stendhalliefhebber - waardoor
    most of these writes will gradually disappear from the collective memory and gather dust in the warehouses of the libraries. Calvino liked Stendhall, so that

A phenomenon that could be considered the opposite of inattentive annotation is the practice whereby an annotator tries to be more thorough than is warranted. One of these is the stof_1 ‘substance’ token with not_listed as alternative, (37), which refers to “stoffen in de hersenen die verliefdheid veroorzaken” ‘substances in the brain that cause infatuation’; the comment for the not_listed tag suggested that it referred to chemical processes rather than substances, but there is nothing that points in that direction. Another one is the stof_2 ‘fabric’ token with both stof_1 ‘substance’ and stof_4 ‘dust’ as alternatives, (38), where the stof_4 ‘dust’ annotator suggested the interpretation of “in de stof dringen” ‘soak in the fabric’ as “in de vergetelheid dringen” ‘push into oblivion’, although with minimum confidence (they may have skipped the “niet” ‘not’).

  1. , in een levenslange schakel van serie-monogamie . ’ Na 36 maanden zijn de stoffen in de hersenen die verliefdheid veroorzaken opgebrand , maar partners hebben dat vaak pas na een
    , in a lifelong link?? of serial monogamy. ’After 36 months the substances in the brain that cause infatuation are burnt out, but partners have that only after a
  2. vlekwerende " stay clean“-behandelingen dringen zelfs vloeistoffen zoals olie , vruchtensap of water niet in de stof . De 3 by 1 wordt nu al beschouwd als een echte” all
    stain resistant “stay clean”-therapies even liquids like oil, fruit juice or water don’t soak into the fabric. The 3 by 1 is considered now as a real "all

Additional conventional usages: Some annotations reveal conventional usages that were not contemplated by the original list of senses, like idiomatic expressions that are figurative in nature but depict an image where the target can still be interpreted literally. That is the case of het stof doen opwaaien ‘kick/stir up the dust (create chaos)’, which was attested in all batches but received different annotations in each, with various combinations of stof_3 ‘topic’, stof_4 ‘dust’ and ocasionally not_listed. Except for two single annotations3, all annotators were consistent in which tag they assigned to these cases. (39) is a variation using the verb oplaaien ‘flare up’: it is most likely a typo but with potential for semantic blend.

  1. aldus de Schotse kloonexpert Ian Wilmut . Antinori’s praktijken hebben al eerder het nodige stof doen oplaaien . In 1994 riep hij grote internationale verontwaardiging over zich af ,
    says the Scottish cloning expert Ian Wilmut. Antinori’s practices have already stirred/flared up the necessary dust before. In 1994 he brought great international outrage on himself,

Further expressions are “kort/lang van stof zijn” ‘have little/a lot to say, lit. be short/long in stuff (to say)’ and “door het stof gaan” ‘go through a lot of trouble, lit. go through the dust.’4 The former was attested in (40), annotated as stof_3 ‘topic’ by two annotators, one of which acknowledged the figurative meaning, and stof_2 ‘fabric’ by the other one, who identified the idiom but indicated stof_2 ‘fabric’ as the meaning of the target inside of it (which was our original request… although the Klein Van Dale identifies this as a fixed expression inside stof_3 ‘topic’). Two further occurrences were annotated as stof_3 ‘topic’ by the majority and not_listed by the other one, and one “lang van stof” ‘have a lot to say’ with stof_2 ‘fabric’, stof_3 ‘topic’ and not_listed.

  1. hoofdredactioneel stukje spreekt voor zich . " Ook co-auteur Yoeri Albrecht is kort van stof : " Wij blijven achter ons artikel staan . In de politieke verslaggeving worden
    editorial piece speaks for itself." Co-author Yoeri Albrecht is also short of words (lit. fabric): "We stay by our article. In political reporting

The expression “door het stof gaan” ‘go through a lot of trouble, lit. go through the dust’, exemplified in (41), was attested some five times with different combinations of stof_3 ‘topic’, the “dust” senses and geen ‘none of the above’.

  1. Daarbij moet de krant soms en public door het stof . Zoals toen bleek dat een persbericht van radioprogramma Vroege Vogels , inclusief citaten
    Moreover the newspaper has to bite the dust in public sometimes. Like back then it turned out that a press release from the raio program Early Birds, including quotes

Creative usages: Next to additional but conventional usages, some more creative cases can also be found, like (42), which was annotated as stof_3 ‘topic’ by the majority, given its figurative meaning, but as stof_4 ‘dust’ by a minority focused on the source image of the metaphor. While it is probably a figurative meaning, it is not “topic” (I think?).

  1. klopten ’ ( hoewel historicus Alfred Koss man ooit klaagde dat Isings ’ werk ’ geen stof , geen vuil , geen hitte , geen zweet ’ bevatte ) . Het
    were true’ (although the historian Alfred Koss has complained that Isings’ work contained ‘no dust/stuff, no dirt, no heat, no sweat’). The

Hard to parse contexts: A few concordances have a confusing context that can be indeed hard to classify. Such are the cases of (43) and (44) and two fragments from lyrical texts.

Example (43) was reported as having unclear context by two of the annotators, but another one assigned stof_4 ‘dust’, which is quite reasonable.

  1. Na het stof de douche De tocht door de Hel zit er op . De
    After the dust the shower The trip through hell is over. The

In (44) the target is part of the title of a literary piece and it’s not clear what kind of stof it’s being talked about (other than the article). Two of the annotators chose stof_3 ‘topic’ with high confidence, selecting context words such as (literair (L4), magnum (R4), opus (R5), opzicht (L3)), while the other two, with minimum confidence, chose stof_2 ‘fabric’ and geen ‘none of the above’ respectively. The former pointed to rode (L0), schildersoog (R9) as relevant context words, while the latter selected most of the words surrounding the target and pointed out that it looked like a title but couldn’t find information about it online (I did; the author is Ma Jian and it’s called Red Dust in English).

  1. nirwana , maar een grote ontnuchtering wacht . In literair opzicht blijft Het rode stof duidelijk achter bij Gao’s magnum opus , ondanks Ma’s schildersoog en zijn boeddhistisch getinte mijmeringen over
    nirvana, but a great awakening awaits. From a literary perspective Red Dust is left behind by Gao’s magnum opus, despite Ma’s painter’s eye and his buddhist-tainted reveries over

How hard it is to parse a concordance depends of course on the knowledge of the annotators. One of the unclear cases, which received stof_4 ‘dust’ as alternative exemplified the expression “droge stof” ‘sediment, lit. dry substance’, which in practice takes a dust-like form, but is a specific form of stof_1 ‘substance’.

Reasonable ambiguity: Most of the cases of confusion between stof_1 ‘substance’ and a “dust” sense as alternative are instances of substances that could, in principle, be found as dust or powder, such us “(lucht)vervuilende stoffen” ‘(air) polluting substances’.

Although it was not specified in the definitions, the two homonyms of stof have different grammatical gender. Therefore, in principle, occurrences in singular form should not present confusion between homonyms. If we assume this to be an active and relevant distinction, some of the cases of stof_1 ‘substance’ with “dust” as alternative should be then regarded as result of inattentive annotation rather than actual ambiguity. However, even edited texts like the concordances themselves may select the wrong article, like in (45). Here, the most likely sense is stof_2 ‘fabric’ (and so is the majority sense) but the article corresponds to the “dust” homonym (the alternative sense is indeed stof_4 ‘dust’).

  1. Maar schilderen heeft een materiële kant : de borstels , de verf , het stof , de kleren die je ervoor aan moet . Ik was daar niet goed
    But painting has a material side: the brushes, the paint, the fabric/powder?, the clothers you have to wear for it. I was not well

Nephology of stof

A first impression on the clouds relates to the stress values of the dimensionality reduction and the parameters that make the strongest distinctions between models. We have 144 models of stof created on 10/03/2020, modeling between 314 and 320 tokens. The stress value of the NMDS solution for the cloud of models is 0.158.

Strength of parameters

The strongest division between models is given by the FOC-WIN parameter along the vertical dimension; each half is further by FOC-POS and PPMI, with the PPMI:weight models closer to the center but following the same FOC-POS split of their FOC-WIN halves (Figure 36). This cloud of models also has small groups of outliers: the most evident are the FOC-POS:all + PPMI:selection + SOC-WIN:4 + SOC-POS:all models with LENGTH:5000 | LENGTH:10000 at the right corners and, to a lesser degree, the FOC-POS:nav + PPMI:weight + LENGTH:FOC ones on the left side. The stress values of the NMDS solutions of these models range between 0.203 and 0.263.

Figure 36. Cloud of models of 'stof'. Explore it <a href='https://montesmariana.github.io/NephoVis/level1.html?type=stof'> here</a>.

Figure 36. Cloud of models of ‘stof’. Explore it here.

Figure 37 illustrates the pairwise distance between models by number of shared parameters, colored by PPMI and FOC-POS separately, while Figure 38 reveals their interaction. The left panel of Figure 37 shows that pairs of models that share PPMI:weight tend to be more similar to each other than other pairs of models hat share the same number of parameters, but as tha number of parameters increases so does the similarity between pairs of models that share PPMI:no or PPMI:selection; besides, the distance between PPMI:no and PPMI:weight models is rather very similar regardless of how many parameters are shared between them. The right panel shows that pairs of models with different FOC-POS tend to be more different from each other than those that share it, but that different is more important when there are more shared parameters.

Figure 37. Distances between models of 'stof' by number of shared parameters, colored by whether they share  `PPMI` and `FOC-POS`.

Figure 37. Distances between models of ‘stof’ by number of shared parameters, colored by whether they share PPMI and FOC-POS.

Figure 38 shows some interaction between the strength of PPMI and FOC-POS: FOC-POS:all models seem more similar to each other if they also share PPMI:weight, FOC-POS:nav are more similar to each other if they share PPMI:selection, and models with different FOC-POS are more different from each other when one is PPMI:no and the other one PPMI:no | PPMI:selection.

Figure 38. Distances between models of 'stof' by number of shared parameters, colored by whether they share `FOC-POS` and split by `PPMI`.

Figure 38. Distances between models of ‘stof’ by number of shared parameters, colored by whether they share FOC-POS and split by PPMI.

Focusing on the pairs of models that vary in only one parameter just confirms what the right panel of Figure 37 shows, namely that FOC-POS and PPMI have the strongest individual effects (the distances between models with different values for them is much higher than for models that share them), and that the individual effect of FOC-POS is greater when PPMI:weight (no extra plot needed).

First order filters

Figure 39 shows the quantitative effect of the first order filters. The panels to the left show the number of remaining tokens (top) and first order context words (bottom) after applying each first order filter, and the right panel shows the number of remaining context words per token after applying each filter.

Very few tokens are lost by first order restrictions, and only when combining FOC-POS:nav and another one. PPMI keeps the greater number of context words per token while reducing the total number as much as FOC-WIN and much more than FOC-POS.

Figure 39. Remaining tokens and context words of 'stof' after application of first order filters.

Figure 39. Remaining tokens and context words of ‘stof’ after application of first order filters.

Model comparison

To compare the effect of the stronger parameters the weaker ones will be fixed to SOC-POS:nav + SOC-WIN:4 + LENGTH:FOC, initially discarding PPMI:selection. The distances in the distance matrix (Distance matrix 7) range between 0.2 (between models with FOC-POS:all + PPMI:weight, followed by 0.23 for models with FOC-POS:all + PPMI:no) and 0.77 (between the strictest and the weakest restrictions). The strictest model is the one most different to all the rest. The difference is a bit less drastic when PPMI:no is replaced by PPMI:selection (where the strictest model is the most different to the rest) and shows a different picture when comparing PPMI:no and PPMI:selection, so that the strictest and the loosest models are the most different, but also the difference between models that only differ in PPMI is the smallest (between 0.18 and 0.21).

Distance matrix 7. Distance matrix between some models of ‘stof’
id FOC-POS PPMI FOC-WIN
1 nav weight 10_10
2 nav no 10_10
3 all weight 10_10
4 all no 10_10
5 nav weight 5_5
6 nav no 5_5
7 all weight 5_5
8 all no 5_5

Without color coding, we can see that NMDS solutions with PPMI:no tend to have a dense core with some outliers (rather like satellites) and that neat clusters start to form with perplexity of 20 for the t-SNE solutions. There is one particular small ball in the periphery, especially in the PPMI:weight models, and two big subclouds, especially in the FOC-POS:nav models. This structure becomes much clearer with perplexity 30, but raising it to 50 merges those big subclouds and hides the small ball in the PPMI:no models. This structure is less clear with LENGTH:5000 instead of LENGTH:FOC, to the point that PPMI:no t-SNE models quickly merged into one cluster.

Color coding lets us see that homonyms seem to group together in NMDS solutions, but rather spread around and with a lot of overlap; senses also seem to group together but with big overlap, and stof_5 ‘powder’ is quite disperse in PPMI:no models. In t-SNE models, there is no clear division of homonyms for perplexity of 5 but it does become clear when it’s 20. The little ball, persistent across all models, has tokens of different senses, but that is an artifact of the annotators’ confusion, since those tokens clearly correspond to the idiom “de stof doen opwaaien”. From perplexity 30, one of the big clouds belongs to stof_1 ‘substance’, while the other one is split between stof_2 ‘fabric’ and stof_4 ‘dust’ (different homonyms!) and there’s a smaller, less compact one for stof_3 ‘topic’ (Figure 40). Perplexity 50 does not improve the stucture (the stof_3 ‘topic’ cloud is even merged with the rest); only for FOC-POS:nav models does it stay relatively clear.

Figure 40. Tokens of 'stof' in the t-SNE solutions (perplexity 30) of the selected models

Figure 40. Tokens of ‘stof’ in the t-SNE solutions (perplexity 30) of the selected models

To look at the variation between weaker parameters, the stronger ones were set to a combination with nice separability in the t-SNE solutions: FOC-POS:nav + PPMI:weight + FOC-WIN:10, disregarding LENGTH:10000. The model with SOC-POS:nav + SOC-WIN:4 + LENGTH:FOC is the most different to the rest and keeps a good separability of the clusters. The models with LENGTH:FOC keep the stof_3 ‘topic’ cluster clear away from the rest, while LENGTH:5000 leaves it more disperse and closer to the bigger masses.

The “stof” models don’t seem too bad at identifying homonyms (except for the fabric-dust cluster) and even cluster the “stof doen opwaaien” ‘create chaos/controversy, lit. lift up dust’ tokens, which received different annotations. For further inspection a selection of models with FOC-POS:nav + PPMI:weight and LENGTH:FOC + SOC-POS:nav + SOC-WIN:4 will be examined.


schaal

The noun schaal was tagged with 6 definitions, reproduced in Table 16. Both homonyms are polysemous but have very different frequencies. The first one, roughly equivalent to (abstract) “scale”, is estimated to present mostly the “on big scale” sense (schaal_3) and fewer cases of specific scales, either precising the relation between sizes (schaal_2) or with a name or range (schaal_1). The second, infrequent homonym was mostly registered in the sense of “dish” (schaal_5) but could also refer to the shell of an animal (schaal_4) or the dishes of a scale (schaal_6).

Table 16. Definitions of ‘schaal’.
code definition example freq
schaal_1 1.1 een geordende reeks cijfers, afstanden, hoeveelheden e.d. waarmee iets gemeten wordt de schaal van Celsius, Richter, op een schaal van 1 tot 5 0
schaal_2 1.2 de verhouding tussen de grootte van iets en de weergave ervan in een kaart, model, grafiek etc. een schaal van 1:20, een schaal van 10 km 6
schaal_3 1.3 grootteorde, omvang de schaal van een probleem, op grote/kleine schaal 24
schaal_4 2.1 harde buitenbekleding van zekere organische zaken de schaal van een ei, de schalen van een mossel 0
schaal_5 2.2 ondiepe en wijde schotel een schaal met vruchten 4
schaal_6 2.3 elk van de beide schotels die aan de armen van een balans hangen gewicht in de schaal leggen 0

Sense distribution

The sample consists of 320 tokens (8 batches) out of 14249 occurrences in the QLVLNewsCorpus; the distribution of the majority senses of each batch, as well as the pilot-based estimate and the overall distribution, are reproduced in Figure 41. The distributions of the annotations (not majority senses) by annotator are shown in Figure 42. Batch 8 was annotated by 4 annotators.

The proportions don’t seem robust across batches: the majority of the tokens was annotated as schaal_3 ‘scale-size’ and schaal_5 ‘dish’ has a relative frequency between 10 and 15% in all but two batches, but the rest are rather unstable. The overall distribution, however, resembles indeed the estimate, adding some presence of the senses schaal_1 ‘scale-range’ (not distinguished from schaal_2 ‘scale-transformation’ in the pilot sample) and schaal_6 ‘dish-scale’. Sense schaal_4 ‘shell’ would seem to occur only once in the whole concordance.

Figure 41. Distribution of majority senses of 'schaal' per batch

Figure 41. Distribution of majority senses of ‘schaal’ per batch

In rough terms, the proportion of senses across annotators within batches is quite stable and most variation occurs between senses of the same homonym. The most evident differences are three annotators with preference for one of the senses of the first homonym in batches 1, 6 and 7.

Figure 42. Distribution of sense annotations of 'schaal' per annotator, grouped by batch.

Figure 42. Distribution of sense annotations of ‘schaal’ per annotator, grouped by batch.

“schaal” is a noun with two homonyms of different frequencies, both polysemous with senses of different frequencies.

Confusion matrix

Matrices

The confusion matrix between the majority senses and other tagged senses can be seen in Table 17 (raw number of tokens with such senses assigned) and Table 18 (mean confidence of such sense annotation in each token).

We would expect no confusion between tokens of different homonyms, but disagreement between the senses within each homonym would be acceptable. Confusion in the “dish” set of senses could be attributed to unclear or unspecified contexts, while that between “scale” senses could also be due to a lack of understanding of the differences between them.

There is indeed little overlap, both within and between homonyms; the latter always involves schaal_3 ‘scale-size’. The one schaal_4 ‘shell’ token has full agreement and high confidence; three tokens have unclear as majority sense and only 7 show no agreement. Agreeing annotations have mostly high confidence values.

Table 17. Non weighted sense matrix of ‘schaal’ senses. Proportion of tokens with full agreement per sense-tag is: schaal_1: 0.93, schaal_2: 0.75, schaal_3: 0.93, schaal_4: 1, schaal_5: 0.95, schaal_6: 0.79, unclear: 0.33. Proportion of tokens with full agreement per homonym is: geen: 0.2, scale: 0.98, dish: 0.9.
scale
dish
geen
senses schaal_1 schaal_2 schaal_3 schaal_4 schaal_5 schaal_6 between not_listed unclear
schaal_1 30 0 2 0 0 0 0 0 0
schaal_2 1 12 2 0 0 0 0 0 0
schaal_3 8 3 208 0 0 1 2 1 0
schaal_4 0 0 0 1 0 0 0 0 0
schaal_5 0 0 1 0 39 0 1 0 0
schaal_6 0 0 1 0 1 19 0 2 0
unclear 0 0 1 0 1 0 0 0 3
no_agreement 5 2 5 0 2 3 0 2 3
total 44 17 220 1 43 23 3 5 6
Table 18. Weighted sense matrix of ‘schaal’ senses. Mean confidence across the lemma is 4.49; values above are darker and boldened. Median confidence across the lemma is 5.
scale
dish
geen
senses schaal_1 schaal_2 schaal_3 schaal_4 schaal_5 schaal_6 between not_listed unclear
schaal_1 4.59 0 5 0 0 0 0 0 0
schaal_2 5 4.42 2.5 0 0 0 0 0 0
schaal_3 3.75 3.67 4.56 0 0 3 2.5 2 0
schaal_4 0 0 0 4.75 0 0 0 0 0
schaal_5 0 0 1 0 4.51 0 2 0 0
schaal_6 0 0 5 0 5 4.51 0 4 0
unclear 0 0 4 0 5 0 0 0 1.39
no_agreement 2.8 3.5 3.8 0 4 2.33 0 1.5 3.33

Examples

Among the challenging concordances of this lemma, there are instances of inattentive annotation, additional conventional usages and hard to parse contexts. Additionally, there are situations that could be attributed to inattentive annotation but are probably due to insufficient understanding of the –relatively subtle– distinction between the senses of the “scale” homonym.

Inattentive annotation: There are 8 schaal_3 ‘scale-size’ tokens with schaal_1 ‘scale-range’ as alternative: they all belong to batch 6, where the same annotator favored schaal_1 ‘scale-range’ in concordances that even matched the example for the schaal_3 ‘scale-size’ definition: 6 were cases of “op grote schaal” ‘on a large scale’ and the other two of “op dergelijke schaal” ‘on such a scale’.

All the other disagreements between senses of the “scale” homonym or between them and geen ‘none of the above’ can instead be attributed to insufficient understanding of the definitions and consequent difficulty at extrapolating them to slightly different circumstances. There are cases like “op landelijke schaal” ‘on a national scale’, “op industriële schaal” ‘on an industrial scale’, “op iedere schaal” ‘on any scale’, along with examples (46) and (47). These are schaal_3 ‘scale-size’ tokens with between as alternative: the hesitating annotator, the same in both cases, doubted between schaal_1 ‘scale-range’ and schaal_3 ‘scale-size’ but did not find the latter appropriate because it was not “de schaal van een probleem” ‘the scale of a problem’ (the first of the examples in the definition).

  1. feest van de hoogst individuele expressie en de behoefte aan authenticiteit . En de schaal van de menselijke maat wordt opnieuw uitgevonden . Waar anders dan in het woninginterieur
    feast/celebration of the highest individual expression and the need of authenticity. And the scale of the human measure is invented again. Where else but in the interior of the home
  2. tien jaar geleden de westerse wasmachine . Kijk , dat is globalisering op menselijke schaal . Een ouderwetse bak nog , die je met water vult , waarin je
    ten years ago the western washing machine [appeared]. Look, that is globalization on a human scale. Still an old fashioned tank that you fill with water, where you

Additional conventional usages: The cases of confusion between senses of the “dish” homonym and sometimes between them and senses of the “scale” homonym or geen ‘none of the above’ reveal two conventional usages that were not contemplated in the initial list of definitions: specialization of schaal_5 ‘dish’ in sport contexts (with the role of a trophy, called “shield” in English) and metaphorical expressions derived from the image of putting weights on a scale.

The sport specialization of schaal_5 ‘dish’ was attested in at least four concordances in three different batches, always with at least one annotator assigning schaal_5 ‘dish’, and is exemplified in (48), which is probably the only case in which the sentence makes sense without previous knowledege of this practice, and (49). (48) was assigned schaal_5 by the majority and schaal_3 by the alternative; (49) takes it a step further by adding a metonymic reference to a competition (like it’s often done with “Cup”): this concordance received three unclear annotations and one schaal_5 ‘dish’.

  1. niets . Geen beker , geen titel . PSV krijgt straks de schaal die hoort bij de kampioen van Nederland . De verklaring voor het Brabantse succes
    nothing. No cup, no title. PSV will soon recieve the shield that belongs to the champion of the Netherlands. The explanation for the Brabantic success
  2. het voorbije seizoen een paar keer nadrukkelijk in de kijker . Won met de schaal Sels en het Kampioenschap van Vlaanderen twee wedstrijden . - Anekdote : " De
    the previous season and a couple of times very much in the sportlight. Won with the Sels shield and the Championship of Flanders two matches.- Anecdote: "The

Finally, in the four schaal_6 ‘dish-scale’ cases with alternative senses and at least two more with mostly geen ‘none of the above’ tags, the target was used in idiomatic expressions involving schaal_6 ‘dish-scale’, such as “zijn gewicht in de schaal werpen” ‘use one’s influence, lit. put one’s weight on (a dish of) the scale), “gewicht in de schaal leggen” ’be of importance/influence, lit. place weight on (a dish of) the scale’ and variations thereof. Only a minority of the annotators reported that it was a figurative expression.

Hard to parse contexts: Some concordances have ambiguous or unclear contexts, mostly requiring lexical knowledge that the annotators apparently lacked. Other than the forementioned cases of the shield, examples (50), (51) and (52), all from different batches, exemplify this situation.

Example (50) belonged to the same batch of the eight schaal_3 ‘scale-size’ tokens with schaal_1 ‘scale-range’ as alternative: the majority sense was again schaal_3 ‘scale-size’ and the same disagreeing annotator assigned schaal_6 ‘dish-scale’ instead. The expression “glijdende schaal”, aka “hellende vlak” ‘slippery slope’, refers to a rethorical argument; the annotators were clearly unaware of it, but given the combination with glijdend ‘slippery’, a “dish” sense is more reasonable than a “scale” one.

  1. wapengebruik . Juist daarom vreest Geoffrey Bindman , een vooraanstaand mensenrechtenactivist , een glijdende schaal : " De bewapening van agenten om criminaliteit de kop in te drukken is iets wat
    criminal use of weapons. Precisely because of that Geoffrey Bindman, a prominent activist for Human Rights, feared a slippery slope [argument] (lit. a slippery dish): Arming agents to push the buttons of crime is a bit

Example (51) received two geen ‘none of the above’ tags reporting insufficient context and one of schaal_3 ‘scale-size’, while (52) was assigned schaal_1 ‘scale-range’ (with minimum confidence), schaal_3 ‘scale-size’ and unclear. In both cases the target has the specific sense of ‘pay scale’, which may be closest to schaal_1 ‘scale-range’ or even considered a separate sense of the “scale” homonym. Its most relevant cue, other than hogere ‘higher’ and laagste ‘lowest’ (which were selected by most annotators), is CAO ‘CLA, collective labor agreement’, which typically includes pay scales: none of the annotators selected it. This is a conventional sense but probably not active in the annotators’ encyclopedic knowledge, which made it hard for them to assign a sense or even suggest a new one.

  1. , zijn geen twee aanstellingen aan elkaar gelijk . Zeker niet in de hogere schalen . Meijerink wil volstaan met een globale CAO die door de individuele instellingen kan
    no two appointments are the same. Definitely not in the higher pay scales). Meijerink is content with a global CLA that can […] based on individual settings
  2. . Ze ziet daarbij als mogelijkheid om in bestaande CAO’s nog een extra laagste schaal er bij te maken om doorstroming en het aannemen van jongeren te bevorderen .
    Moreover she sees the possibility to add an extra lowest pay scale in existing CAOs in order to promote a flow and employment of younger people.

Nephology of schaal

A first impression on the clouds relates to the stress values of the dimensionality reduction and the parameters that make the strongest distinctions between models. We have 144 models of schaal created on 10/03/2020, modeling between 308 and 320 tokens. The stress value of the NMDS solution for the cloud of models is 0.121.

Strength of parameters

The subclouds of the cloud of models are less distinct than for other lemmas, but color coding does show important groupings, with FOC-POS:all + PPMI:selection | PPMI:no on the right side, PPMI:weight on the top left quadrant and FOC-POS:nav + PPMI:selection | PPMI:no on the bottom left quadrant, more or less, closer to the PPMI:weight area than to the FOC-POS:all area. Furthermore, SOC-WIN:10 models tend to go towards the center of the plot, while SOC-WIN:4 models go towards the outside (Figure 43). The stress values of the NMDS solutions of these models range between 0.192 and 0.263.

Figure 43. Cloud of models of 'schaal'. Explore it <a href='https://montesmariana.github.io/NephoVis/level1.html?type=schaal'> here</a>.

Figure 43. Cloud of models of ‘schaal’. Explore it here.

In order to compare the effects of the strongest parameters, I selected clouds with SOC-POS:nav + LENGTH:FOC + PPMI:weight | PPMI:no. I compared sets with different SOC-WIN separately and, while there are some differences (particularly under certain conditions), I cannot decide which one works better yet.

FOC-POS, PPMI and SOC-WIN seem to have the biggest impact in pairwise distances between models, and the first two interact with each other: Figure 44 shows the pairwise distance between models by number of shared parameters colored by FOC-POS and split by PPMI and Figure 45 the boxplots colored by SOC-WIN.

For this lemma PPMI:weight does not make models that share it so drastically more similar to each other –PPMI:no and PPMI:selection do it as well, especially for models that share FOC-POS. PPMI:weight models resemble models with different PPMI if they share FOC-POS:nav: if both or none of the models have PPMI:weight, they resemble each other more when they share FOC-POS, slightly more if it’s FOC-POS:all. Models that share all but one parameter are most different if that varying parameter is FOC-POS and most similar if FOC-POS:nav is one of the common parameters (unless they vary in PPMI:weight | PPMI:no).

Figure 44. Distances between models of 'schaal' by number of shared parameters, colored by `FOC-POS` and split by `PPMI`.

Figure 44. Distances between models of ‘schaal’ by number of shared parameters, colored by FOC-POS and split by PPMI.

SOC-WIN:4 seems to make models more different to each other: if models share only that value (blue boxplot in column 1 of Figure 45), they are more different to each other than if they don’t share any (column 0). (This is also the case for all previous nouns.)

Figure 45. Distances between models of 'schaal' by number of shared parameters, colored by `SOC-WIN`.

Figure 45. Distances between models of ‘schaal’ by number of shared parameters, colored by SOC-WIN.

Zooming in on the pairs of models that share all but one parameter and color coding with PPMI values (Figure 46), we can see that the interaction of PPMI:weight with the individual effect of FOC-POS is weaker than for other lemmas, while another interaction appears that was only attested in hoop (Figure 13): larger distances for models with different LENGTH and SOC-POS if PPMI:weight.

Figure 46. Distances between models of 'schaal' that vary along only one parameter, colored by `PPMI`.

Figure 46. Distances between models of ‘schaal’ that vary along only one parameter, colored by PPMI.

First order filters

Figure 47 shows the quantitative effect of the first order filters. The panels to the left show the number of remaining tokens (top) and first order context words (bottom) after applying each first order filter, and the right panel shows the number of remaining context words per token after applying each filter.

A very small number of tokens is lost by first order parameters: only by FOC-POS:nav in combination with another filter, and up to 3.75% with all filters applied.

Figure 47. Remaining tokens and context words of 'schaal' after application of first order filters.

Figure 47. Remaining tokens and context words of ‘schaal’ after application of first order filters.

Model comparison

In the distance matrices between these models (Distance matrix 8 and 9), the highest value is between the strictest and the least strict model (0.72 with SOC-WIN:10, 0.75 with SOC-WIN:4). With SOC-WIN:4 (Distance matrix 8), the distance matrix individuates the FOC-POS:all + PPMI:no models as the most different to the rest, while most similar to each other (0.35). With SOC-WIN:10 instead (Distance matrix 9), FOC-WIN:5 makes for a bigger difference between models that differ in PPMI (0.65-0.7), but not so much when they differ in FOC-POS.

Distance matrix 8. Distance matrix between some models of ‘schaal’
id FOC-POS PPMI FOC-WIN
1 nav weight 10_10
2 nav no 10_10
3 all weight 10_10
4 all no 10_10
5 nav weight 5_5
6 nav no 5_5
7 all weight 5_5
8 all no 5_5
Distance matrix 9. Distance matrix between some models of ‘schaal’
id FOC-POS PPMI FOC-WIN
1 nav weight 10_10
2 nav no 10_10
3 all weight 10_10
4 all no 10_10
5 nav weight 5_5
6 nav no 5_5
7 all weight 5_5
8 all no 5_5

For a first inspection of the strongest parameters, a selection of models with LENGTH:FOC + SOC-WIN:4 + SOC-POS:10, initially discarding PPMI:selection, will be examined.

The NMDS solutions seem to be more disperse (or less concentrated) with PPMI:weight, showing one big cloud with some satellites and even a smaller cloud close by (or two, for FOC-WIN:10 models), while the PPMI:no models show a dense round cloud with a longish arm to a side and, more clearly in the FOC-WIN:10 + FOC-POS:nav model, a smaller cloud in the periphery. The clear small cloud from PPMI:weight models matches the arm of PPMI:no models and is made of most of the “dish” tokens; in the former they are most of the schaal_5 ‘dish’ tokens and the schaal_6 ‘dish-scale’ form a slightly separate cloud nearby, while in the latter the end of the arm is made of schaal_5 ‘dish’ and the schaal_6 ‘dish-scale’ either connect it to the main cloud or are already swollen by it. Finally, the tiny subcloud mostly visible in FOC-WIN:10 + FOC-POS:nav | PPMI:weight models is made of most of the schaal_1 ‘scale-range’ tokens, brought together by the context word “Richter”. Models with PPMI:selection do not add separability to the clusters found in PPMI:no models.

The t-SNE solutions show three small clusters around a mass that turns from an archipelago with perplexity 5 to a more compact mass with perplexity 20 and a more disperse one with perplexity 50. The three small clusters correspond quite well to tokens annotated with schaal_1 ‘scale-range’, schaal_5 ‘dish’ and schaal_6 ‘dish-scale’ (Figure 48). The three senses do cluster quite neatly even in the NMDS solution. Furthermore, PPMI:weight models show a small cluster of tokens with klein ‘small’ as main shared context word (in the expression “op kleine schaal” ‘on a small scale’).

Figure 48. Tokens of 'schaal' in the t-SNE solutions (perplexity 30) of the selected models

Figure 48. Tokens of ‘schaal’ in the t-SNE solutions (perplexity 30) of the selected models

A main difference between models with different FOC-POS for this lemma is whether they make use of prepositions such as op and in. On the one hand, they do characterize certain usages. On the other, they are very frequent in the sample (particularly op) so they are not necessarily distinctive. In any case, FOC-POS:nav cases do not perform worse than the FOC-POS:all counterparts, relying mostly on adjectives.

If SOC-WIN:4 is replaced with SOC-WIN:10, the NMDS solutions are a bit more disperse, and the separability between subclouds therefore much lower, but the difference in the t-SNE solutions is not so visible. There does not seem to be much of a different between LENGTH:FOC and LENGTH:5000 either, but in FOC-WIN:10 + FOC-POS:nav + PPMI:selection models the schaal_2 ‘scale-transformation’ tokens cluster with LENGTH:FOC but not with LENGTH:5000, and with SOC-WIN:4 but not SOC-WIN:10 (Figure 49).

Figure 49. Beautiful token cloud of 'schaal'.

Figure 49. Beautiful token cloud of ‘schaal’.

“schaal” shows a suprisingly good division between homonyms and senses, regardless of their frequencies. PPMI:selection shows a wider spread of tokens of schaal_3 ‘scale-size’ than PPMI:weight and could have inner clusters. For further inspection the cloud in Figure 49 will be inspected.


blik

The noun blik was tagged with 6 definitions, reproduced in Table 19. It has two polysemous homonyms, with a more clear polysemy in the most frequent one (and roughly equally frequent senses) and a more subtle metonymy (conceptually not challenging, but strongly dependent on the clarity of the context) for the other. The former corresponds to English “gaze/look”, with separate senses for physically looking at something (blik_1), the (eyes-focused) facial expression (blik_2) and metaphorical, intellectual look (blik_3). The latter means “tin” and could refer to the material itself (blik_4), an object (like a tin can, blik_5) or canned food (blik_6). In the pilot sample no instances of blik_6 ‘tin-food’ were found and the distinction between blik_4 ‘tin-material’ and blik_5 ‘tin-object’ was neglected.

Table 19. Definitions of ‘blik’.
code definition example freq
blik_1 1.1 oogopslag een blik werpen op iets, een blik van verstandhouding 10
blik_2 1.2 gezichtsvermogen een scherpe blik 12
blik_3 1.3 inzicht, in intellectuele zin een brede blik 11
blik_4 2.1 dun geplet metaal, i.h. bijz. vertind dun plaatstaal dozen uit blik 6
blik_5 2.2 voorwerp (i.h.bijz. doos voor voedsel) vervaardigd uit zulk materiaal stoffer en blik, een blik erwtjes, een maaltijd uit blik 0
blik_6 2.3 voedsel bewaard in een voorwerp als bedoeld in 2.2 eet je niet teveel blik? 0

Sense distribution

The sample consists of 280 tokens (7 batches) out of 22175 occurrences in the QLVLNewsCorpus; the distribution of the majority senses of each batch, as well as the pilot-based estimate and the overall distribution, are reproduced in Figure 50. The distributions of the annotations (not majority senses) by annotator are shown in Figure 51. No batch was annotated by 4 annotators.

The pilot sample seems to have underestimated the frequency of blik_1 ‘look-gaze’, which appears as frequent as blik_2 ‘look-expression’ and blik_3 ‘look-intellectual’ together in the overall distribution and is the most frequent sense in all batches. Between blik_4 ‘tin-material’ and blik_5 ‘tin-object’, the annotators seem to prefer the latter sense, and only two out of the 280 tokens were primarily tagged as blik_5 ‘canned food’. There is not a high number of cases with no agreement between the annotators and they are mainly focused on two batches, one of which has no tokens with a “tin” sense as majority sense.

Figure 50. Distribution of majority senses of 'blik' per batch

Figure 50. Distribution of majority senses of ‘blik’ per batch

The distribution of homonyms seems to be stable across annotators within batches, but that is not the case for the senses. In most batches, the preference for blik_1 ‘look-gaze’ or blik_2 ‘look-expression’, or even blik_3 ‘look-intellectual’ seems to vary widely between annotators. The inbalance is less evident for the “tin” homonym, but that has to do with its lower frequency.

Figure 51. Distribution of sense annotations of 'blik' per annotator, grouped by batch.

Figure 51. Distribution of sense annotations of ‘blik’ per annotator, grouped by batch.

“blik” is a noun with two homonyms of different frequencies, each with three senses and skewed frequencies.

Confusion matrix

Matrices

The confusion matrix between the majority senses and other tagged senses can be seen in Table 20 (raw number of tokens with such senses assigned) and Table 21 (mean confidence of such sense annotation in each token).

We would expect no confusion at all between the homonyms, and probably more confusion between “tin” senses than between “look” senses. Indeed, there is only one case of confusion between homonyms (see (53)); as we can see in Table 21, the confidence of that annotation is low.

5.36% of tokens have no agreement between annotators; in most of the cases the disagreement is between senses of the “look” homonym. This will be discussed in the “Examples” subsection. At the homonym level, in fact, the proportion of tokens with full agreement is very high in both cases.

Table 20. Non weighted sense matrix of ‘blik’ senses. Proportion of tokens with full agreement per sense-tag is: blik_1: 0.52, blik_2: 0.22, blik_3: 0.44, blik_4: 0.6, blik_5: 0.68. Proportion of tokens with full agreement per homonym is: gaze: 0.98, tin: 0.94.
gaze
tin
geen
senses blik_1 blik_2 blik_3 blik_4 blik_5 blik_6 between not_listed unclear wrong_lemma
blik_1 162 56 21 0 0 0 1 0 0 0
blik_2 27 37 1 0 0 0 0 1 0 0
blik_3 13 5 34 1 0 0 0 0 0 0
blik_4 0 0 0 5 2 0 0 0 0 0
blik_5 0 0 0 5 22 2 0 0 0 0
blik_6 0 0 0 1 1 2 0 0 0 0
not_listed 0 1 0 0 0 0 0 1 0 0
unclear 1 0 0 0 0 0 0 0 1 0
wrong_lemma 1 0 0 0 0 0 0 0 0 1
no_agreement 12 12 11 2 3 0 0 2 3 0
total 216 111 67 14 28 4 1 4 4 1
Table 21. Weighted sense matrix of ‘blik’ senses. Mean confidence across the lemma is 3.79; values above are darker and boldened. Median confidence across the lemma is 4.
gaze
tin
geen
senses blik_1 blik_2 blik_3 blik_4 blik_5 blik_6 between not_listed unclear wrong_lemma
blik_1 4.01 3.62 2.76 0 0 0 0 0 0 0
blik_2 3.93 3.18 4 0 0 0 0 3 0 0
blik_3 3.77 2.8 3.68 3 0 0 0 0 0 0
blik_4 0 0 0 4.13 3 0 0 0 0 0
blik_5 0 0 0 3 4.2 5 0 0 0 0
blik_6 0 0 0 4 5 3.5 0 0 0 0
not_listed 0 3 0 0 0 0 0 2.5 0 0
unclear 3 0 0 0 0 0 0 0 3.5 0
wrong_lemma 4 0 0 0 0 0 0 0 0 3.5
no_agreement 3.75 3.58 3.36 2.5 2.67 0 0 0 1.67 0

Examples

Among the challenging concordances of this lemma, there are instances of inattentive annotation, creative usages, errors in the corpus and hard to parse contexts, including cases of reasonable ambiguity.

The definitions of the “look” homonym were apparently not clear enough for the annotators. On the one hand, the distinction between the senses is a matter of construal and all three could, in principle, be present to a certain degree in any given context, i.e. while taking a look (blik_1) at something the observer’s eyes express a certain attitude (blik_2) and the observer themselves may gain some insight (blik_3). Both the specificity of the concordance and the annotators’ personal biases towards one aspect or another may play a role in which senses they consider most active, and they could be considered cases of reasonable ambiguity. On the other hand, the actual nature of the distinction between the senses seems to have been unclear for some annotators, to the point that some of them suggest an “alternative” sense that actually matches blik_2 ‘look-expression’. Hence, for this lemma, “inattentive annotation” includes both superficial reading of the concordances and of the definitions.

Inattentive annotation: The annotations with either geen ‘none of the above’ or disagreement between homonyms are mostly cases of inattentive annotation. First, in the three cases of confusion between blik_2 ‘look-expression’ and not_listed, which belong to the same batch, the alternative suggestion actually matches blik_2 (“manier van kijken” ‘way of looking’ and “emoties” ‘emotions’). Second, the one case with disagreement between homonyms, (53), was most likely a matter of clicking wrong: the selected context words were more or less the same with all annotators and nothing points to a “tin” interpretation on the part from the annotator that selected it. Finally, the one case with unclear as majority sense, which had blik_1 ‘look-gaze’ as alternative, (54), is rather an instance of blik_2 ‘look-expression’.

  1. één en dezelfde fotograaf . Stuk voor stuk zijn ze gemaakt met eenzelfde sobere blik die tegelijkertijd afstandelijk en inlevend is . Alles op die foto’s is bedoeld om
    by one and the same photographer. One by one they were taken with one and the same look that is at the same time detached and empathizing. Everything on those photographs is meant to
  2. , broek , kousjes , handschoenen , helm , schoenen ) . Zelfs de blikken waren eender : poeltjes van ellende . Een voorlopig hoogtepunt werd een week later
    , pants, stockings, gloves, hat, shoes). Even their gazes were the same: small pools of misery. A week later a temporary highlight

Creative usages: One of the tokens with no agreement illustrates a figurative version of an expression normally applied to food, (55). One annotator assigned blik_5 ‘tin-object’ with high confidence and suggested it might be a figurative expression, another one assigned blik_4 ‘material’ with medium confidence and the other one assigned geen ‘none of the above’ with minimum confidence and reported it as a figurative expression. The presence of the article, however, is not typical.

  1. Toch zullen ze eraan moeten wennen . Homoseksuelen komen niet uit het blik . Ze zijn er . Overal . In elke stad
    but they will have to get used to it. Homosexuals don’t come from a tin can. They are here. Everywhere. In every city

Errors in the corpus: The one annotation with wrong_lemma as majority sense is actually an instance of the verb blikken ‘to look’, which is certainly related to blik_1 ‘look-gaze’, the alternative annotation.

Hard to parse contexts: Some concordances were difficult to understand and categorize, as is shown by the disagreement between the annotators. (56) was assigned blik_1 ‘look-gaze’, blik_2 ‘look-expression’ (which looks more likely) and unclear, while (57), blik_2 ‘look-expression’ and geen ‘none of the above’ with different explanations. The former is refers most likely to an expression, but the latter is so vague that it could even be either of the homonyms (I think?). Finally, (58) received blik_4 ‘tin-material’, blik_5 ‘tin-object’ and unclear: it refers to a medal, which is not literally made of tin.

  1. leuk , zei Simons , ’ ik ben Johan Simons van Hollandia . Blanco blik . Dat trek ik me dan persoonlijk aan , ja . ’
    , said Simons, ‘I am Johan Simons from Hollandia. Blank look. I care about that myself, yes.’
  2. lul eruit . " Hoe is het , soldaat ? Is mijn blik niet goed genoeg voor je ? Was hij soms niet goed genoeg voor je
    like an asshole. "How is it, soldier? Is my look not good enough for you? Was he sometimes not good enough for you
  3. . Nieuweling Eggenkamp was gelukkig : " Mijn eerste finale in Luzern en gelijk blik . " Slag Bartman meent dat nog verbeteringen kunnen worden aangebracht , waardoor goud
    . Novice Eggenkamp was happy: “My first finale in Luzern and already?? metal.” Slag?? Bartman said that it could still be improved, so that gold

Reasonable ambiguity: After the cases of hard to parse contexts, the rest of the tokens with no agreement are instances of the “look” homonym that, based on the aspect of the situation that the annotator highlights, could in principle correspond to any of its senses. See for example the “onderzoekende blikken” ‘inquisitive/investigative looks’ of (59), where someone is effectively looking at something (blik_1 ‘look-gaze’) but with intellectual insight as its goal (blik_3 ‘look-intellectual’), which is reflected in their expression (blik_2 ‘look-expression’). Five of these cases belonged to the same batch (3), and all but one received the same tag by each individual annotator.

  1. Bijkomen in een buitenlands ziekenhuis . Onderzoekende blikken boven je bed . Vage herinneringen aan sirenes en zwaailichten . Vorig
    Recover in a hospital abroad. Inquisitive looks over your bed. Vague memories of sirens and flashing lights. The previous

Normally, I can select one of the senses with little hesitation, but unless all the cases are attributed to inattentive annotation (even the kind related to not reading the definitions properly), it does suggest that the distinction might not be pertinent in these cases.

Even the annotations with some agreement and alternatives within the same “look” homonym reflect probably the same phenomenon. Figure 51 already suggests that these disagreements are not precisely random but linked to individual tendencies towards one or the other sense.

Nephology of blik

A first impression on the clouds relates to the stress values of the dimensionality reduction and the parameters that make the strongest distinctions between models. We have 144 models of blik created on 11/03/2020, modeling between 261 and 280 tokens. The stress value of the NMDS solution for the cloud of models is 0.148.

Strength of parameters

The main groups in the cloud of models come from an interaction between FOC-POS and PPMI, with FOC-POS:all + PPMI:selection | PPMI:no on the right side, FOC-POS:all + PPMI:weight in the middle and FOC-POS:nav on the left side, inside which PPMI:no is distinct from PPMI:weight | PPMI:selection (Figure 52). The FOC-POS:all + PPMI:selection | PPMI:no group is split by PPMI along the vertical axis and by SOC-WIN along the horizontal one, but no other parameters make clear groupings. The stress values of the NMDS solutions of these models range between 0.191 and 0.296.

Figure 52. Cloud of models of 'blik' colored by `PPMI`. Explore it <a href='https://montesmariana.github.io/NephoVis/level1.html?type=blik'> here</a>.

Figure 52. Cloud of models of ‘blik’ colored by PPMI. Explore it here.

Figure 53 shows the effect of FOC-POS and PPMI on the pairwise distance between models and their interaction: the distance between models with the same FOC-POS and PPMI is smaller than between models that don’t share them; it’s greater between models with different FOC-POS if they don’t share PPMI:weight and when only one of the models has PPMI:weight, FOC-POS:nav procures a smaller distance.

Figure 53. Distances between models of 'blik' by number of shared parameters, colored by `FOC-POS` and split by `PPMI`.

Figure 53. Distances between models of ‘blik’ by number of shared parameters, colored by FOC-POS and split by PPMI.

The plot of individual effects of the parameters, Figure 54, is similar to what is seen in hoop (Figure 13) in that PPMI:weight makes for smaller distances between models with different FOC-POS or FOC-WIN, but larger between models of different LENGTH. SOC-WIN:4 also increases the difference between models of different FOC-POS.

Figure 54. Distances between models of 'blik' that vary along only one parameter, colored by `PPMI`.

Figure 54. Distances between models of ‘blik’ that vary along only one parameter, colored by PPMI.

First order filters

Figure 55 shows the quantitative effect of the first order filters. The panels to the left show the number of remaining tokens (top) and first order context words (bottom) after applying each first order filter, and the right panel shows the number of remaining context words per token after applying each filter.

FOC-POS:nav empties some tokens, especially in combination with PPMI and eliminating 6.79% with all filters applied. It also reduces most drastically the number of context words per token, while keeping the total number of context words relatively high: if only PPMI is applied, the number of context words per token is higher and the total number much lower.

Figure 55. Remaining tokens and context words of 'lemma' after application of first order filters.

Figure 55. Remaining tokens and context words of ‘lemma’ after application of first order filters.

Model comparison

To compare the strongest parameters, a set of models with SOC-WIN:4 + SOC-POS:nav + LENGTH:FOC will be inspected, initially discarding PPMI:selection.

The distance matrix between the selected models (Distance matrix 10) shows relatively high values, the smallest being 0.35 (between FOC-POS:all + PPMI:weight models: 3 and 7) and the largest 0.94 (between 5, FOC-WIN:10 + FOC-POS:nav + PPMI:weight() on the one hand and the FOC-POS:all + PPMI:no models, 4 and 8, on the other, which present the largest values overall and a distance of 0.48 between each other). If PPMI:no is replaced with PPMI:selection, the largest differences are between models with different FOC-POS, which is drastically enhanced when PPMI:no and PPMI:selection are compared.

Distance matrix 10. Distance matrix between some models of ‘blik’
id FOC-POS PPMI FOC-WIN
1 nav weight 10_10
2 nav no 10_10
3 all weight 10_10
4 all no 10_10
5 nav weight 5_5
6 nav no 5_5
7 all weight 5_5
8 all no 5_5

The NMDS solutions show two outliers to which the PPMI:no + FOC-WIN:10 + FOC-POS:all model is very sensitive, so that all other tokens are joined together in the center and far away from them. Their FOC-POS:nav, FOC-WIN:10 and PPMI:selection counterparts are much less sensitive. Their concordances are reproduced and discussed in the “Outliers” subsection. In the t-SNE clouds they tend to be located in the periphery of the larger clouds, but further inspection is required to see if they are being clustered with other tokens.

The t-SNE solutions start presenting some clusters with perplexity 20, but PPMI:no + FOC-WIN:10 + FOC-POS:all models are then just uniform masses. The clusters are more clear with perplexity 30 but still part of the bigger, disperse cloud.

The “tin” homonym does tend to group in all solutions, particularly in PPMI:weight models, but even t-SNE solutions don’t set it apart so clearly. The same goes for blik_2 ‘look-expression’ and blik_3 ‘look-intellectual’, which take up their own areas in the NMDS models but within the bigger “look” cloud.

There are however two clear clusters in t-SNE solutions (perplexity 20 or higher) that don’t match sense tags but collocations: one for “een blik werpen” and one for “de blik richten”. They can be seen on PPMI:weight models and also models with FOC-POS:nav if perplexity is lower than 50 (Figure 56). These clusters and that for blik_5 ‘tin-object’ seem more distinguishable from the main, chaotic cloud in PPMI:weight | PPMI:selection + FOC-POS:nav models. Once FOC-POS:nav is selected, the rest of the parameters don’t make that much of a difference visually.

Figure 56. Tokens of 'blik' in the t-SNE solutions (perplexity 30) of the selected models

Figure 56. Tokens of ‘blik’ in the t-SNE solutions (perplexity 30) of the selected models

The models of “blik” are not very succesful at discriminating the senses used in the annotation (neither were the annotators), although to a certain degree they can distinguish the homonyms. However, they do tell apart some fixed constructions, so that t-SNE models cluster cases of “een blik werpen” and “de blik richten”. For further inspection the models with FOC-POS:nav + FOC-WIN:10 + PPMI:selection | PPMI:weight for first order parameters and SOC-POS:nav + SOC-WIN:4 + LENGTH:FOC for second order parameters will be selected, and variations of one parameters may be compared as well.

Outliers

The outliers in the NMDS models are reproduced in (60) and ((???)). The first one was unanimously assigned the blik_5 ‘tin-object’ tag with confidence values of 3 and 4 and in most NMDS solutions is placed next to another token of the same sense. Without any filter, it has 9 first order features (8 of which are nouns or adjectives) all of them with positive PMI with blik/noun. The second outlier was discussed above and reproduced as (56). Even the NMDS models that are not so sensitive to outliers push it to the periphery: with or without filters, the only valid context word is “blanco/adj”, with a PPMI of 2.01.

  1. ; 225 gram vet varkensgehakt ; 175 verse , geschilde waterkastanjes of 85 gram waterkastanjes uit blik ; 1 theelepel zout ; theelepel versgemalen zwarte peper ; 3 eetlepels fijngehakte lente-uien ; 1
    225g minced pork fat; 175 fresh, peeled water chestnut or 85g canned water chestnuts (lit. water chestnut from a can); 1tsp salt; tsp freshly ground black pepper; 3 tbsp finely chopped spring onions;

spoor

The noun spoor was tagged with 8 definitions, reproduced in Table 22. There are three homonyms with uneven distribution. The first and most frequent one corresponds roughly to “trace” and comprises senses such as “physical footprint” (spoor_1), “trace, evidence of (previous) presence” (spoor_2), “traces (of a substance in another)” (spoor_3) or “figurative trace/path to follow” (spoor_4, which was not looked for in the pilot annotation). The second one refers to the railways (spoor_5) and its metonymic extensions to trains (spoor_6) and railway companies (spoor_7). The last homonym means “spur” but is almost never used in its literal sense in this corpus, while it does occur in the fixed idiomatic expression “zijn sporen verdienen” (to prove one’s skills of aptitude for something).

Table 22. Definitions of ‘spoor’.
code definition example freq
spoor_1 1.1 afdruk door iets of iemand op z’n weg achtergelaten het spoor van een fiets op een zandweg, een spoor van vernieling 3
spoor_2 1.2 blijk van aanwezigheid door iets of iemand (ongewild) achtergelaten naar sporen zoeken, iemand op het spoor komen 17
spoor_3 1.3 kleine hoeveelheid sporen van lood in het leidingwater 10
spoor_4 1.4. te volgen of gevolgde weg in figuurlijke zin het juiste spoor 0
spoor_5 2.1 weg met twee rijen metalen staven waarover treinen e.d. rijden niet op het spoor lopen! 3
spoor_6 2.2 de trein als vervoermiddel met het spoor reizen 1
spoor_7 2.3 spoorwegbedrijf bij het spoor werken, het spoor staakt 2
spoor_8 3 metalen punt of wieltje aan de hiel van een rijlaars, gebruikt om het rijdier te prikkelen zijn sporen verdienen 3

Sense distribution

The sample consists of 360 tokens (9 batches) out of 37307 occurrences in the QLVLNewsCorpus; the distribution of the majority senses of each batch, as well as the pilot-based estimate and the overall distribution, are reproduced in Figure 57. The distributions of the annotations (not majority senses) by annotator are shown in Figure 58. Batch 2 was annotated by 4 annotators.

The overall distribution has some minor differences with the expected one. There are less cases of spoor_3 ‘traces-substance’, quite a number of spoor_4 ‘traces-figurative’ and also many tokens where the annotators didn’t reach an agreement. The whole “railway” homonym seems quite infrequent (still, 10% of the tokens), and while the “spur” homonym was expected to be very infrequent, it does occur in 4 of the 9 batches.

Figure 57. Distribution of majority senses of 'spoor' per batch

Figure 57. Distribution of majority senses of ‘spoor’ per batch

The sense distribution across annotators within batches seems relatively stable at the homonym level but quite diversified at sense level.

Figure 58. Distribution of sense annotations of 'spoor' per annotator, grouped by batch.

Figure 58. Distribution of sense annotations of ‘spoor’ per annotator, grouped by batch.

“spoor” is a noun with three homonyms of different frequencies, the most frequent of which are polysemous with skewed frequencies.

Confusion matrix

Matrices

The confusion matrix between the majority senses and other tagged senses can be seen in Table 23 (raw number of tokens with such senses assigned) and Table 24 (mean confidence of such sense annotation in each token).

We expect no overlap between homonyms, except maybe between spoor_4 ‘trail-figurative’ and the “railway” senses. The metonymic relation between the senses of this second homonym are also probably harder to determine than those between the senses of the first one, but at the same time its low frequency makes it hard to compare.

There is indeed almost no overlap between homonyms or even between spoor_3 ‘traces’ and spoor_4 ‘trail-figurative’. The few annotations of spoor_8 ‘spur’ that are not majority sense are cases of inattentive annotation. A relatively high number of tokens (9.44%) shows no agreement between annotators, which can be expected given the number of options even within a homonym; 27 of the 34 tokens have disagreement between senses of the “traces” homonym. The different reasons are discussed in the “Examples” subsection.

Table 23. Non weighted sense matrix of ‘spoor’ senses. Proportion of tokens with full agreement per sense-tag is: spoor_1: 0.29, spoor_2: 0.44, spoor_3: 0.41, spoor_4: 0.54, spoor_5: 0.71, spoor_6: 0.5, spoor_7: 0.5, spoor_8: 1. Proportion of tokens with full agreement per homonym is: traces: 0.88, railway: 0.93, spur: 1.
traces
railway
spur
geen
senses spoor_1 spoor_2 spoor_3 spoor_4 spoor_5 spoor_6 spoor_7 spoor_8 between not_listed unclear wrong_lemma
spoor_1 49 27 3 4 0 0 0 0 0 1 0 0
spoor_2 41 129 10 14 0 0 0 1 0 2 4 0
spoor_3 5 10 22 1 0 0 1 0 0 0 0 0
spoor_4 5 14 0 72 2 0 0 0 0 7 5 0
spoor_5 1 0 0 2 31 3 3 0 0 0 0 0
spoor_6 0 0 0 0 1 2 0 0 0 0 0 0
spoor_7 0 0 0 0 2 2 8 0 0 0 0 0
spoor_8 0 0 0 0 0 0 0 10 0 0 0 0
not_listed 0 0 0 0 1 0 0 0 0 1 0 0
unclear 1 0 0 0 0 0 1 0 0 0 2 0
no_agreement 25 25 6 12 3 3 0 4 2 9 7 1
total 127 205 41 105 40 10 13 15 2 20 18 1

Overall, the confidence given to the annotations of this lemma is quite low; only those of spoor_8 ‘spur’ (the only one in its homonym), spoor_5 ‘railway’ and some low-frequency disagreeing annotations are above the median and the agreeing annotations of spoor_2 ‘trace-evidence’ and spoor_3 ‘traces-substance’ are above the mean, but so are some other disagreeing annotations. These low numbers probably has to do with the high number of possible senses within each homonym and the difficulty to distinguish between them in the context given by the concordance.

Table 24. Weighted sense matrix of ‘spoor’ senses. Mean confidence across the lemma is 3.58; values above are darker and boldened. Median confidence across the lemma is 4.
traces
railway
spur
geen
senses spoor_1 spoor_2 spoor_3 spoor_4 spoor_5 spoor_6 spoor_7 spoor_8 between not_listed unclear wrong_lemma
spoor_1 3.42 3.33 4 2.5 0 0 0 0 0 0 0 0
spoor_2 3.59 3.77 3.5 3.43 0 0 0 3 0 2.5 2.75 0
spoor_3 3.4 3.4 3.89 2 0 0 5 0 0 0 0 0
spoor_4 2.2 3.21 0 3.53 4.5 0 0 0 0 3.14 1.2 0
spoor_5 2 0 0 2 4.04 5 3.67 0 0 0 0 0
spoor_6 0 0 0 0 5 3.33 0 0 0 0 0 0
spoor_7 0 0 0 0 1 2.5 3.92 0 0 0 0 0
spoor_8 0 0 0 0 0 0 0 4.28 0 0 0 0
not_listed 0 0 0 0 5 0 0 0 0 3 0 0
unclear 2 0 0 0 0 0 5 0 0 0 1.5 0
no_agreement 2.7 3.24 3 2.33 3.67 3 0 2.5 0 2.44 1.71 5

Examples

Among the challenging concordances of this lemma, there are instances of inattentive annotation, additional conventional usages, creative usages and hard to parse contexts, including some cases of reasonable ambiguity. Among the greater issues with the annotation of this lemma are (i) the ambiguity between spoor_1 ‘(foot)print’ and spoor_2 ‘trace-evidence’ in cases where the difference is not evident or relevant and (ii) the difficulty to understand the relation between spoor_1 ‘(foot)print’ and spoor_4 ‘trail-figurative’. The idea behind spoor_4 ‘trail-figurative’ is that someone leaves a trail (of footprints) after them and by following them one follows the same path, i.e. acts in the same way, experiences the same things. The image of this trail left by others who have walked it before can be applied to a number of scenarios, from following someone’s steps to being or putting someone in the right path. However, if the image of the path itself is more prominent than the relationship to the traces that gave it origin, the annotator may lean towards the spoor_5 ‘railway’.

Inattentive annotation: Most of the tokens with spoor_2 ‘traces-evidence’ or spoor_4 ‘trail-figurative’ as majority sense and either a different homonym or geen ‘none of the above’ as alternative are cases of inattentive annotation where the disagreeing annotator didn’t pay enough attention either to the concordance or to the definitions, like in (61), or in any case didn’t fully understand the definition (as in (63)).

Example (61) was tagged as spoor_2 “traces-evidence” by the majority and spoor_8 ‘spur’ by the third annotator, which must be a slip on their part; spoor_8 ‘spur’ occurs almost exclusively in the expression “zijn sporen verdienen” ‘lit. earn their spurs’, referring to a situation where someone proves their worth and skill at a task. That makes also (62) a case of inattentive annotation, since only one of the annotators assigned spoor_8 ‘spur’ (with maximum confidence), one spoor_2 ‘traces-evidence’ with minimum confidence and one geen ‘none of the above’ with minimum confidence and reporting hesitation between spoor_2 ‘traces-evidence’ and spoor_4 ‘trail-figurative’. For now this seems to be the only spoor_8 ‘spur’ case with disagreeing annotations.

  1. zegt burgemeester Janssens . " Met de helikopter zal het makkelijker zijn om het spoor van de overvallers te blijven volgen . " Maar ook de klopjacht naar de
    says Mayor Janssens. “With the helicopter it will be easier to follow the trail of the attackers.” But also the manhunt of the
  2. Brink waren auteurs van Nederlandse musicals die , voordat Van den Ende zich aandiende , hun sporen hadden verdiend in het cabaret . In hun musicals lag de nadruk op taal
    Brink were authors of Dutch musicals that, before Van den Ende presented himself, had proved their worth (lit. earned their spurs) in the cabaret. In their musicals the emphasis was on the language

Example (63) received two spoor_4 ‘trail-figurative’ tags with high confidence and a geen ‘none of the above tag’ with minimum confidence and the comment that there was not enough context; despite the expression “op het spoor zitten” ‘be on the (right) path’ being a typical example of spoor_4 ‘trail-figurative’.

  1. studiedienst samenstellen . Tegen oktober volgend jaar moet duidelijk zijn of we op het spoor zitten . Maar ik wil wel degelijk mijn stempel drukken op de vernieuwingsbeweging .
    set up a research department. By October of next year it must be clear whether we are on track. But I will leave a clear mark on the renovation movement.

About a third of the tokens with no agreement between senses but full agreement for the “traces” homonym (9 tokens) also exemplify this situation. (64), for example, received spoor_1 ‘footprint’, spoor_2 ‘traces-evidence’ and even spoor_3 ‘traces-substance’ with medium or high confidence. The spoor_3 ‘traces-substance’ annotation had maximum confidence and the explanation that it referred to a small quantity of information, which goes to show the extent to which definitions and interpretations can be stretched to fit each other. The expression “van X is (er) geen spoor (te bekennen)” ‘of X there are no traces (to admit/confess?)" is a typical example of spoor_2 ’traces-evidence.’5

  1. , want ik heb ervaring . " Van Hennie van Doeselaar is nog geen spoor te bekennen . Geruchten gaan dat hij het vliegtuig van 15.25 uur niet haalt
    , because I have experience." Of Hennie van Doeselaar there is still no trace to be found. Rumor has it that he will not take the 15.25 flight

Another third corresponds to impressions/traces of abstract entities like in (65): the example “een spoor van vernieling” ‘a trail of destruction" in the spoor_1 ’footprint’ definition suggests it as the appropriate sense tag, but the annotators chosed different options.

  1. moord noemden ze " haat tegen buitenlanders " . De drie toonden geen enkel spoor van berouw , maar ontkenden de bedoeling te hebben gead Adriano te vermoorden .
    murder they called “hate towards foreigners”. The three of them showed no sign (lit. trace) of remorse, but denied having had the intention of killing Adriano.

Additional conventional usages: Some of the challenging tokens illustrate conventional usages that were not contemplated by the original list of senses. The most clear one is the fixed expression “het spoor bijster raken” ‘to lose one’s way’, linked to spoor_4 ‘trail-figurative’: in two batches it received spoor_4 ‘trail-figurative’ as majority sense and geen ‘none of the above’ as alternative, in another one it was the other way around and in a fourth one it received spoor_1 ‘footprint’, spoor_2 ‘traces-evidence’ and spoor_8 ‘spur’ (this by an annotator that assigned this tag 4 times, only once correctly).

Further conventional senses are a literal racing track (examples (66) and (67)), a thinking track (68) and two variations on spoor_2 ‘traces-evidence’, namely “dood spoor” ‘dead end’ and the general sense of ‘clue’ in (69) and (70) respectively.

Example (66) was assigned spoor_1 ‘footprint’, spoor_4 ‘trail-figurative’ and geen ‘none of the above’, where the annotator reported that the expression was unknown to them and it might be related to one of the senses, maybe the last one (spoor_8 ‘spur’?!). (67), on the other hand, was assigned spoor_1 ‘footprint’, spoor_5 ‘railway’ and spoor_8 ‘spur’: one of each homonym.

  1. Stessens bereikte het eerste keerpunt in de aanloopronde met een lang gerokken groep in zijn spoor . Bij het ingaan van de drie grote ronden vormde er zich een kopgroepje
    Stessens reached the first turning point in the track with a long stretched group in his track. At the beginning of the three big rounds a small leading group was formed
  2. dit keer aanvankelijk weinig van . Alle bochten van Wolvega werden in het derde spoor gerond . Voor de doorsnee draver is zoiets te veel van het goede ,
    little of […] this time initially. All the curves of Wolvega were taken in the third track. Too much for the average trotter,

The majority of the annotators of (68) reported insufficient context, and the third one suggested spoor_1 ‘footprint’. The example of “dead end”, (69), was assigned spoor_1 ‘footprint’, spoor_2 ‘traces-evidence’ and spoor_4 ‘trail-figurative’ by the same annotators that tagged (67) and in the same order. Finally, (70), referring to clues left by an author, was assigned spoor_1 ‘footprint’, spoor_2 ‘traces-evidence’ and spoor_4 ‘trail-figurative’.

  1. voor de Federale Beleidsnota Drugs van minister van Volksgezondheid Magda Aelvoet . De twee sporen die zich de voorbije dagen aftekenden , liggen nog altijd netjes naast elkaar op de regeringstafel
    for the Federal Policy memo on drugs by the minister of Public Health Magda Aelvoekt. The two tracks that came up in the last days, still lay next to each other on the government table
  2. eiland in 1974 zijn alle handels- en economische contacten tussen Ankara en Nikosia op een dood spoor beland . De kwestie-Cyprus was wel het belangrijkste obstakel voor een akkoord met Turkije
    island in 1974 all commerce and economic contacts between Ankara and Nikosia reached a dead end (lit. trace). The Cyprus-issue was indeed the most important obstacle for an agreement with Turkey
  3. , de gotische sfeer die bijna iedere gebeurtenis in een verdacht licht plaatst , de valse sporen die vooral de lezer op het verkeerde been zetten , de theorieën over het wezen van
    , the gothic spheer that puts almost any event under a suspicious light, the false clues that set mainly the reader on the wrong track, the theories over the being

Creative usages: Among the challenging cases two tokens proved to illustrate rather creative usages of the target lemma. (71) is a reformulation of the proverb “All roads lead to Rome”, which in Dutch is ‘alle wegen leiden naar Rome’; however, the context is not large enough to indicate what motivates the play of words (maybe an issue with the railway company). One annotator assigned spoor_6 ‘train’ with high confidence, the second spoor_2 ‘traces-evidence’ with medium confidence and the third one one reported to hesitate between spoor_1 ‘footprint’ and spoor_2 ‘traces-evidence’, leaning towards the latter. (72) was assigned spoor_2 ‘traces-evidence’, spoor_4 ‘trail-figurative’ and geen ‘none of the above’ reporting insufficient context, always with minimum or low confidence. This usage is probably linked to the one in (68), referring to a way of thinking, so that “smal spoor” ‘narrow path’ is linked to “op de korte termijn” ‘short-term’.

  1. Je kunt moeilijk beweren dat het een exclusieve Vlaamse aangelegenheid is , want alle sporen lopen naar Wallonië . " ( PhT )
    You can hardly claim that it is an exclusively Flemish issue, because all tracks run to Wallonia. " (PhT)
  2. . Prima , maar helaas bekijken de critici ons van op een erg smal spoor . Vooral de media focussen haast uitsluitend op de korte termijn , becommentariëren liever
    . Great, but unfortunately the critics look at us from a very narrow path. Specially the media focus almost exclusively on the short term, prefer to comment

Hard to parse contexts: In 5 of the challenging cases the context is really not clear enough: they are either titles (a memoir, “Sporen trekken door strategische jaren”, and a memo “De derde eeuw spoor”) or, given the current window size and the format (without paratextual information), the text is almost nonsensical. (73) was assigned spoor_5 ‘railway’, spoor_2 ‘traces-evidence’ and geen ‘none of the above’ reporting unclear context. The sentence “Kunst op het spoor” could well be interpreted as either “Art on the railway” or “Art on the trail”, although the latter interpretation is made less likely by the lack of argument (on the trail of what?) while the former more likely by the presence of NMBS, the Flemish name of the Belgian railway company, at the end of the concordance. No annotator selected it as relevant context word. (74) on the other hand is made quite confusing by the parentheses, as one of the annotators pointed it out. The text seems to be a list of definitions: one annotator assigned spoor_2 ‘traces-evidence’ with minimum confidence, another one spoor_5 ‘railway’ with medium confidence, a third one reported insufficient context with minimum confidence and the last one, with high confidence, that the word was actually buitensporig ‘extraordinary’ rather than the target lemma. Finally, (75) has typographical issues: some spaces are missing, making it hard to read for both humans and computers; moreover, it looks like a table of contents, so that even the fragments of text that can be extracted from it are not full sentences. The expression “op het juiste spoor” ‘on the right path’ clearly belongs to spoor_4 ‘trail-figurative’, which is the majority sense of this token, but some confusion on the annotators part and also on the computer’s part would be reasonable.

  1. Kunst op het spoor ARCHEOLOGIE Van onze redacteur Geert Sels HOEGAARDEN – Voor de NMBS de
    Art on the tracks ARCHEOLOGY From our editor Geert Sels HOEGAARDEN – For the NMBS (National Railway Company of Belgium) the
  2. ’ , concurrent : mededinger om de beste positie , extreem : buiten ( gangbare ) sporen ( sporig ) , boycotten : dwars-bomen , in de clinch , in onmin ( minnen
    ’, competitor: contender for the best position, extreme: outside (acceptable) tracks (sporig), boycotten: thwart, at odds: in discord (love
  3. 16 pagina’s sportbijlagekoning van kermiskoersen nu kampioen16 pagina’s werchter-bijlageop het juiste spoor de top-20 van de weispetterend couleur café cultuur , pagina’s 9 en 10
    16 paginas sportappendixking of criterium now champion16 paginas werchter-appendixon the right track the top-20 van de cream splashing couleur cultural café, paginas 9 and 10

Reasonable ambiguity: A number of tokens illustrate the expression “sporen nalaten/achterlaten” ‘leave traces/footprints’: they are often not literal footprints, but the sense relates more to an impression made than to a trail to follow, or the distinction seems less relevant. Annotators tend to split between spoor_1 ‘footprint’ and spoor_2 ‘traces-evidence’, which is to be expected. In three tokens from the same batch, among which was (76), two annotators selected one of those senses and the other one geen ‘none of the above’, reporting the instance as an idiomatic expression; in three others from another batch, the four annotators were equally split between the two senses, differently in (77) than in the other two tokens, of “sporen nalaten”.

  1. De overname van Titan , dochter van het Nederlandse Internoc , door uitzendgroep Creyf’s liet haar sporen na . De verkoper , genoteerd op de Brusselse nieuwe markt , won 29,03
    The takeover of Titan, daughter of the Dutch Internoc, by the media group Creyf’s left its trace/impression. The buyer, listed on the new market of Brussels, won 29.03
  2. doel niet enkel het leven zelf was . Je moest door je leven een spoor achterlaten als je verdwenen was , iets waar de andere mensen het beter door zouden krijgen
    goal was not only life itself. By living you had to leave a trace if you were gone, something through which other people would be better

Nephology of spoor

A first impression on the clouds relates to the stress values of the dimensionality reduction and the parameters that make the strongest distinctions between models. We have 144 models of spoor created on 11/03/2020, modeling between 342 and 360 tokens. The stress value of the NMDS solution for the cloud of models is 0.146.

Strength of parameters

The most dividing parameter is FOC-POS, splitting the cloud of models along the vertical dimension. Within each half, the most clear divisions are made by PPMI, with PPMI:selection between the other two values. In the FOC-POS:all aream which is more disperse, the SOC-WIN:10 models tend towards the center (Figure 59). The stress values of the NMDS solutions of these models range between 0.236 and 0.279.

Figure 59. Cloud of models of 'spoor' colored by `PPMI`. Explore it <a href='https://montesmariana.github.io/NephoVis/level1.html?type=spoor'> here</a>.

Figure 59. Cloud of models of ‘spoor’ colored by PPMI. Explore it here.

Figure 60 illustrates the interaction between FOC-POS and PPMI in the distance between pairs of models by number of shared parameters. In pairs of models that share PPMI:no or PPMI:weight, those that share FOC-POS are more similar to each other and those that don’t are much more different, distances being lower with PPMI:weight. Distances are much higher with PPMI:selection, even higher than for pair of models with PPMI:weight | PPMI:selection and, like between the pairs of models with different PPMI, distance between models that share FOC-POS:nav are lowest and between those that share FOC-POS:all sometimes as high as between those that don’t share FOC-POS. Moreover, like for other lemmas, models that share SOC-WIN:4 are more different from each other than those that share SOC-WIN:10 or don’t share SOC-WIN at all.

Figure 60. Distances between models of 'spoor' by number of shared parameters, colored by `FOC-POS` and split by `PPMI`.

Figure 60. Distances between models of ‘spoor’ by number of shared parameters, colored by FOC-POS and split by PPMI.

Zooming in on the models that share all but one parameter, the plots in Figure 61 show that, like in other lemmas, the first order parameters have the largest individual impacts and PPMI weight models tend to have the smallest distances. Unlike in other lemmas, PPMI:selection models have the largest distances in all cases, and FOC-POS:all increases the difference between models with different PPMI and SOC-WIN. Not shown in this case because the picture is so similar to previous cases is that SOC-WIN:4 models have the largest distances across almost all groups.

Figure 61. Distances between models of 'spoor' that vary along only one parameter, colored by `PPMI` and `FOC-POS`.

Figure 61. Distances between models of ‘spoor’ that vary along only one parameter, colored by PPMI and FOC-POS.

First order filters

Figure 62 shows the quantitative effect of the first order filters. The panels to the left show the number of remaining tokens (top) and first order context words (bottom) after applying each first order filter, and the right panel shows the number of remaining context words per token after applying each filter.

Only FOC-POS, especially in combination with the other filters, leaves tokens out, up to 5% with all of them in action. However, both FOC-WIN and PPMI lower the total number of context words much more, without reducing the number of context words per token as much (the median stays around 5).

Figure 62. Remaining tokens and context words of 'spoor' after application of first order filters.

Figure 62. Remaining tokens and context words of ‘spoor’ after application of first order filters.

Model comparison

In order to compare the strongest parameters, a selection of models with SOC-POS:nav + SOC-WIN:4 + LENGTH:FOC will be examined, initially discarding PPMI:selection.

The distance matrix between these models has rather large values, with the lowest (between 0.36 and 0.43) represent the difference between models that only differ in FOC-WIN (Distance matrix 11). When PPMI:no is replaced by PPMI:selection, the picture is very different indeed: the smallest distances are between FOC-POS:nav models that only differ in PPMI and the largest models with different FOC-POS and PPMI (Distance matrix 12). Comparing PPMI:no | PPMI:selection models instead, the most different models are those with FOC-POS:all + PPMI:selection.

Distance matrix 11. Distance matrix between some models of ‘spoor’
id FOC-POS PPMI FOC-WIN
1 nav weight 10_10
2 nav no 10_10
3 all weight 10_10
4 all no 10_10
5 nav weight 5_5
6 nav no 5_5
7 all weight 5_5
8 all no 5_5
Distance matrix 12. Distance matrix between some models of ‘spoor’
id FOC-POS PPMI FOC-WIN
1 nav weight 10_10
2 nav selection 10_10
3 all weight 10_10
4 all selection 10_10
5 nav weight 5_5
6 nav selection 5_5
7 all weight 5_5
8 all selection 5_5

In the NMDS solutions FOC-POS:all + PPMI:selection | PPMI:no models are sensitive to the outlier reproduced in ((???)), while the rest seem to be equally spread out around a core of variable density. The “spur” and “railway” tokens seem to stick together, as do those from spoor_3 ‘traces-substance’ and spoor_4 ‘trail-figurative’, while those of spoor_1 ‘footprint’ and spoor_2 ‘trace-evidence’ are all over the place.

In the t-SNE solutions all models look like archipelagos with perplexity 5 and uniform masses with at most one or two buttons with perplexity 50; with perplexity of 20, PPMI:no models have a large central cloud and at least two smaller clusters (one of “urine” as shared context word and one with “van de daders ontbreekt elk spoor” ‘there are no traces of the criminals’), while the PPMI:weight models have a larger number of small clusters close to each other. Perplexity 30 spreads the tokens more, but clusters are still visible, more densely packed in PPMI:weight than in PPMI:no; PPMI:selection models are in between, with more clear clusters than in PPMI:no but not as neatly separated as in PPMI:weight. Because the “traces” homonym is the most frequent it takes up most of the clouds, but the “spur” homonym has its own island in all models, while the “railway” tokens have a more clear cluster in the PPMI:weight models (Figure 63). There is a cluster of spoor_1 ‘trace-footprint’ and spoor_2 ‘traces-evidence’ corresponding to the expression “sporen nalaten”; some for spoor_3 ‘traces-substance’, spoor_4 ‘trail-figurative’, spoor_8 ‘spur’ and spoor_5 ‘railway-main’ as well as a rather populated one with cases of “spoor van de daders”.

Figure 63. Tokens of 'spoor' in the t-SNE solutions (perplexity 30) of the selected models

Figure 63. Tokens of ‘spoor’ in the t-SNE solutions (perplexity 30) of the selected models

The model with neater clusters seems to be the one with FOC-WIN:10 + FOC-POS:nav + PPMI:weight; inspecting different combinations of second order parameters did not reveal any further improvement from other values (the distances are also very small). The resulting cloud is illustrated in Figure 64.

Figure 64. Reasonable token cloud of 'spoor'.

Figure 64. Reasonable token cloud of ‘spoor’.

For further insight the model in Figure 64 will be inspected.

References

De Pascale, S. (2019). Token-based vector space models as semantic control in lexical lectometry (PhD thesis). Retrieved from https://lirias.kuleuven.be/retrieve/549451


  1. It would seem our instructions were not clear enough. All three annotators agreed on the sense tag for this instance of horde, which is “right”, except that it’s not Dutch.

  2. The disagreeing annotator chose the same context words as one of the agreeing ones in (10), namely in (L6), terug (R3), te (R4), vinden (R5), but none in (11), where their colleagues chose die (L0), in (L1), verantwoordelijken (L2), de (L3) and in (L1), sterken (L4), die (L10) respectively.

  3. In batch 8, one annotator assigned stof_3 ‘topic’ to three concordances and stof_4 ‘dust’ to the four, making them the majority sense in each case; in the one case in batch 7, one annotator that assigned stof_3 ‘topic’ in both cases of batch 2 chose geen ‘none of the above’.

  4. There are others too: three of the stof_4 ‘dust’ tokens with stof_3 ‘topic’ as alternative illustrate the expressions “in het stof kruipen” ‘go to a lot of trouble for someone, lit. crawl in the dust’, “in het stof bijten” ‘bite the dust’ and “van onder het stof halen” ‘discover, lit. take from under the dust’ respectively. Another one is an example of “het stof is gaan liggen” ‘the dust has settled’, which is also found with the verb “neerdalen” in a stof_3 ‘topic’ token with stof_4 ‘dust’ as alternative.

  5. At a certain point the question came up whether some annotators should be just ignored because of their annotations bordering on the ridiculous. Initial plots showing the percentage of disagreement of each annotator suggest that those cases are actually neglectable, they don’t occur that often. In addition, these examples show that excentricites are not necessarily a bad thing. The same annotator that suggested spoor_3 ‘traces-substances’ for (64) was the only annotator to select spoor_8 ‘spur’ for (62): having a number of annotators does help compensate human error through aggregation.